巴西专利BR112015025092B1 AUDIO PROCESSING SYSTEM AND METHOD FOR PROCESSING AN AUDIO BITS FLOW

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
audio processing system. the present invention relates to an audio processing system (100) comprising a front-end component (102, 103) that receives quantized spectral components and performs inverse quantization, yielding a time domain representation of an intermediate signal. . the audio processing system further comprises a frequency domain processing stage (104, 105, 106, 107, 108) configured to provide a time domain representation of a processed audio signal and a rate converter sampler (109), provides a reconstructed audio signal sampled at a target sampling frequency. the respective internal sampling rates of the time domain representation of the intermediate audio signal and the time domain representation of the processed audio signal are equal. in particular embodiments, the processing stage comprises a parametrically boosted mixing stage that is operable in at least two different modes and is associated with a delay stage that guarantees constant total delay.
公开号:BR112015025092B1
申请号:R112015025092-0
申请日:2014-04-04
公开日:2022-01-11
发明作者:Lars Villemoes；Kristofer Kjoerling；Heiko Purnhagen
申请人:Dolby International Ab；
IPC主号:

专利说明:

CROSS REFERENCE TO RELATED ORDERS
[0001] This application claims priority to Provisional Patent Applications US 61/809,019 filed on April 5, 2013 and 61/875,959 on September 10, 2013, each of which is incorporated herein by way of reference in its entirety. FIELD OF TECHNIQUE
[0002] This disclosure generally pertains to audio encoding and decoding. Various modalities provide audio encoding and decoding systems (referred to as audio codec systems), in particular suitable for speech encoding and decoding. BACKGROUND
[0003] Complex technological systems, including audio decodec systems, typically evolve cumulatively over an extended period of time, and often through coordinated attempts in independent research and development groups. As a result, these systems can include odd combinations of components that represent different model paradigms and/or uneven levels of technological progress. The frequent desire to preserve compatibility with legacy equipment places an additional constraint on designers and can result in a less coherent system architecture. In parametric multichannel audio codec systems, backwards compatibility may involve, in particular, providing an encoded format in which the downmixing signal will return a pertinent sound output when played back on a mono or stereo playback system. ground without processing capabilities.
[0004] Available audio coding formats representing the state of the art include MPEG Surround, USAC and High Frequency AAC v2. These have been thoroughly described and analyzed in the literature.
[0005] It would be desirable to propose a versatile audio codec system with uniform architecture and reasonable performance, especially for voice signals. BRIEF DESCRIPTION OF THE DRAWINGS
[0006] The modalities within the inventive concept will now be described in detail, with reference to the attached drawings, in which:
[0007] Figure 1 is a generalized block diagram showing an overall structure of an audio processing system, according to an exemplary embodiment;
[0008] Figure 2 shows processing paths for two different mono decoding modes of the audio processing system;
[0009] Figure 3 shows processing paths for two different parametric stereo decoding modes, one without and one with post-mix enhancement boosted by the low-frequency content of the encoded waveform,
[0010] Figure 4 shows a processing path for a decoding mode in which the audio processing system processes a stereo waveform encoded signal completely with differently encoded channels;
[0011] Figure 5 shows a processing path for a decoding mode in which the audio processing system provides a five-channel signal by parametrically upmixing a three-channel downmixing signal after applying spectral band replication ;
[0012] Figure 6 shows the structure of an audio processing system, according to an exemplary modality, as well as the internal works of a component in the system;
[0013] Figure 7 is a generalized block diagram of a decoding system, according to an exemplary embodiment;
[0014] Figure 8 illustrates a first part of the decoding system in Figure 7;
[0015] Figure 9 illustrates a second part of the decoding system in Figure 7;
[0016] Figure 10 illustrates a third part of the decoding system in Figure 7;
[0017] Figure 11 is a generalized block diagram of a decoding system, according to an exemplary embodiment;
[0018] Figure 12 illustrates a third part of the decoding system of Figure 11; and
[0019] Figure 13 is a generalized block diagram of a decoding system, according to an exemplary embodiment;
[0020] Figure 14 illustrates a first part of the decoding system in Figure 13;
[0021] Figure 15 illustrates a second part of the decoding system in Figure 13;
[0022] Figure 16 illustrates a third part of the decoding system in Figure 13;
[0023] Figure 17 is a generalized block diagram of a coding system, according to a first exemplary embodiment;
[0024] Figure 18 is a generalized block diagram of a coding system, according to a second exemplary embodiment;
[0025] Figure 19a shows a block diagram of an example audio encoder that provides a bit stream at a constant bit rate;
[0026] Figure 19b shows a block diagram of an example audio encoder that provides a bit stream at a variable bit rate;
[0027] Figure 20 illustrates the generation of an exemplary envelope based on a plurality of blocks of transform coefficients;
[0028] Figure 21a illustrates exemplary envelopes of transform coefficients blocks;
[0029] Figure 21b illustrates the determination of an exemplary interpolated envelope;
[0030] Figure 22 illustrates exemplary sets of quantizers;
[0031] Figure 23a shows a block diagram of an exemplary audio decoder;
[0032] Figure 23b shows a block diagram of an exemplary envelope decoder of the audio decoder of Figure 23a;
[0033] Figure 23c shows a block diagram of an example subband estimator of the audio decoder of Figure 23a;
[0034] Figure 23d shows a block diagram of an exemplary spectrum decoder of the audio decoder of Figure 23a;
[0035] Figure 24a shows a block diagram of an exemplary set of admissible quantizers;
[0036] Figure 24b shows a block diagram of an exemplary dotted quantizer;
[0037] Figure 24c illustrates an exemplary selection of quantizers based on the spectrum of transform coefficient blocks;
[0038] Figure 25 illustrates an exemplary scheme to determine a set of quantizers in an encoder and in a corresponding decoder;
[0039] Figure 26 shows a block diagram of an exemplary scheme to decode entropy encoded quantization indices that were determined using a dotted quantizer; and
[0040] Figure 27 illustrates an exemplary bit allocation process.
[0041] All Figures are schematic and generally show only parts that are necessary in order to elucidate the invention, while other parts may be omitted or merely suggested. DETAILED DESCRIPTION
[0042] An audio processing system accepts a stream of audio bits segmented into frames that transmit audio data. Audio data may have been prepared by sampling a sound wave and transforming the electronic time samples thus obtained into spectral coefficients that are quantized and encoded in a format suitable for transmission or storage. The audio processing system is adapted to reconstruct the sampled sound wave, in a single-channel, stereo or multi-channel format. As used herein, an audio signal may refer to a pure audio signal or the audio portion of a video, audiovisual or multimedia signal.
[0043] The audio processing system is generally divided into a front-end component, a processing stage and a sample rate converter. The front-end component includes: a dequantization stage adapted to receive quantized spectral coefficients and to output a first frequency domain representation of an intermediate signal; and an internal transform stage for receiving the first frequency domain representation of the intermediate signal and synthesizing, based thereon, a time domain representation of the intermediate signal. The processing stage, from which it may be possible to bypass completely in some embodiments, includes: an analysis filter bank for receiving the time domain representation of the intermediate signal and outputting a second frequency domain representation of the intermediate signal; at least one processing component for receiving said second frequency domain representation of the intermediate signal and outputting a frequency domain representation of a processed audio signal, and a synthesis filterbank for receiving the frequency domain representation of the signal of processed audio and output a time domain representation of the processed audio signal. The sample rate converter is finally configured to receive the time domain representation of the processed audio signal and to output a sampled reconstructed audio signal at a target sampling frequency.
[0044] According to an exemplary embodiment, the audio processing system is a single rate architecture, wherein the respective internal sampling rates of the time domain representation of the intermediate audio signal and the time domain representation of the signal of processed audio are the same.
[0045] In particular exemplary embodiments, where the front-end stage comprises a core encoder and the processing stage comprises a parametric boost mixing stage, the core encoder and parametric boost mixing stage operate in equal sampling rates. Additionally or alternatively, the core encoder can be extended to handle a wider range of transform lengths and the sample rate converter can be configured to match standard video frame rates to allow decoding of video audio frames synchronous. This will be described in more detail below in the Audio Mode Decoding section.
[0046] In even more particular example embodiments, the front-end component is operable in an audio mode and a voice mode other than the audio mode. Because the voice mode is specifically adapted for voice content, these signals can be reproduced more reliably. In audio mode, the front-end component can operate in a manner similar to what is revealed in Figure 6 and the associated sections of this description. In voice mode, the front-end component can operate, as discussed, in particular, below in the voice mode encoding section.
[0047] In the example modalities, in general terms, the voice mode differs from the audio mode of the front-end component, as the internal transform stage operates in a smaller frame length (or transform size). A reduced frame length has been shown to capture voice content more efficiently. In some exemplary embodiments, the frame length is variable within the audio mode and within the video mode; it can be, for example, reduced intermittently to capture transients in the signal. In these circumstances, a mode change from audio mode to speech mode will, all other factors being equal, imply a reduction in the frame length of the internal transform stage. Put differently, such a mode change from audio mode to speech mode will imply a reduction in the maximum frame length (outside the selectable frame lengths within each of the audio mode and speech mode). In particular, the frame length in speech mode can be a fixed fraction (eg, 1/8) of the current frame length in audio mode.
[0048] In an exemplary embodiment, a branch line parallel to the processing stage allows the processing stage to be bypassed in decoding modes where no frequency domain processing is desired. This may be suitable when the system decodes discretely or multichannel encoded stereo signals, in particular signals where the full spectral range is waveform encoded (whereby spectral band replication may not be necessary). To avoid time shifts on occasions when the branch line is switched in or out of the processing path, the branch line may preferably comprise a delay stage compatible with the delay (or algorithmic delay) of the processing stage in its mode current. In embodiments in which the processing stage is arranged to have constant (algorithmic) delay regardless of its current mode of operation, the delay stage in the bypass line may incur a constant, predetermined delay; otherwise, the delay stage in the bypass line is adaptive, preferably, and varies according to the current mode of operation of the processing stage.
[0049] In an exemplary embodiment, the parametric boost mixing stage is operable in a mode where it receives a 3-channel down-mix signal and returns to a 5-channel signal. Optionally, a spectral band replication component can be arranged upstream of the parametric boost mixing stage. In a d3e playback channel configuration with three front channels (eg L, R, C) and two surround channels (eg Ls, Rs) and where the encoded signal is 'front-heavy', this example mode can achieve more efficient coding. In fact, the available bandwidth of the audio bitstream is mainly used in an attempt to waveform encoding as much of the front three channels as possible. An encoding device that prepares the audio bit stream to be decoded by the audio processing system can adaptively select decoding in that mode by measuring properties of the audio signal to be encoded. An exemplary embodiment of the boost mixing procedure for upmixing a two-channel downmixing channel and the corresponding downmixing procedure is discussed below under the heading stereo encoding.
[0050] In a further development of the preceding exemplary embodiment, two of the three channels in the downmix signal correspond to co-encoded channels in the audio bitstream. This join coding can cause, for example, the scaling of one channel to be expressed as compared to the other channel. A similar approach has been implemented in AAC-strength stereo encoding, where two channels can be encoded as an even-channel element. It has been proved by listening to the experiments that, at a given bit rate, the captured quality of the reconstructed audio signal improves when some channels of the downmixing signal are encoded together.
[0051] In an exemplary embodiment, the audio processing system additionally comprises a spectral band replication module. The spectral band replication module (or high frequency reconstruction stage) is discussed in more detail below under the heading Stereo encoding. The spectral band replication module is preferably active when the parametric boost mixing stage performs a boost mixing operation, that is, when it returns to a signal with a greater number of channels than the signal receives. When the parametric boost mixing stage acts as a traversing component, however, the spectral band replication module can be operated independently of the particular current mode of the parametric boost mixing stage; that is, in non-parametric decoding modes, spectral band replication functionality is optional.
[0052] In an exemplary embodiment, the at least one processing component additionally includes a waveform encoding stage which is described in greater detail below in the multichannel encoding section.
[0053] In an exemplary embodiment, the audio processing system is operable to provide a suitable reduction mix signal for legacy playback equipment. More precisely, a stereo reduction mix signal is obtained by adding phased surround channel content to the first channel in the reduction mixing signal and adding phase-shifted (e.g. 90 degrees) surround channel content to the second channel. second channel. This allows playback equipment to derive the surround channel content by a combined inverse phase shift and subtraction operation. The reduction mix signal may be acceptable for playback equipment configured to accept a left/right full reduction mix signal. Preferably, the phase shift functionality is not a standard setting of the audio processing system, but can be turned off when the audio processing system prepares a downmix signal not designed for playback equipment of this type. In fact, there are well-known specialty content types that reproduce phase-shifted surround signals very poorly; in particular, sound recorded from a source with limited spatial range that is subsequently panned between a front left surround signal and a surround left signal will not, as expected, be picked up as located between the corresponding front left and front left speakers, but will not be will, according to many listeners, be associated with a well-defined spatial location. This artifact can be avoided by implementing surround channel phase switching as an optional, non-standard functionality.
[0054] In an exemplary embodiment, the frontend component comprises an estimator, a spectrum decoder, an addition unit and an inverse planning unit. Those elements that enhance system performance when processing speech-like signals will be described in more detail below under the heading speech mode encoding.
[0055] In an exemplary embodiment, the audio processing system further comprises an Lfe decoder for preparing at least one additional channel based on information in the audio bit stream. Preferably, the Lfe decoder provides a low frequency effects channel which is waveform encoded separately from the other channels transmitted by the audio bit stream. If the additional channel is discretely encoded with the other channels of the reconstructed audio signal, the corresponding processing path can be independent of the rest of the audio processing system. It is understood that each additional channel adds to the total number of channels in the reconstructed audio signal; for example, in a use case where a parametric boost mixing stage, if provided, operates in an N=5 mode and where there is an additional channel, the total number of channels in the reconstructed audio signal will be N+ 1 = 6.
[0056] Additional exemplary embodiments provide a method that includes steps corresponding to operations performed by the above audio processing system when in use and a computer program product for causing a programmable computer to perform that method.
[0057] The inventive concept additionally refers to an encoder-type audio processing system for encoding an audio signal into an audio bit stream that has a format suitable for decoding in the audio processing system (from decoder type) described in this document. The first inventive concept additionally encompasses encoding methods and computer program products for preparing an audio bit stream.
[0058] Figure 1 shows an audio processing system100, according to an exemplary embodiment. A core decoder 101 receives an audio bit stream and outputs at least quantized spectral coefficients that are provided to a front-end component comprising a dequantization stage 102 and an internal transform stage 103. The front-end component may be of a dual mode type in some exemplary embodiments. In those embodiments, it may be selectably operated in a general-purpose audio mode and a special-purpose audio mode (eg, a voice mode). A processing stage is delimited downstream of the front-end component, at its upstream end by an analysis filter bank 104 and at its downstream end by a synthesis filter bank 108. analysis filterbank 104 and synthesis filterbank 108 perform frequency domain processing. In the embodiment of the first concept shown in Figure 1, these components include: • a compression component 105; • a matched component 106 for high frequency reconstruction, parametric stereo boost mixing; and • a dynamic range control component 107. Component 106 can perform, for example, boost mixing as described below in the Stereo Encoding section of the present description.
[0059] The audio processing system 100 further comprises, downstream of the processing stage, a sample rate converter 109 configured to provide a reconstructed audio signal sampled at a target sampling frequency.
[0060] At the downstream end, system 100 may include, optionally include, a signal limiting component (not shown) responsible for satisfying a no-clip condition.
[0061] Additionally and optionally, system 100 may comprise a parallel processing path to provide one or more additional channels (e.g., a low frequency effects channel). The parallel processing path can be deployed as an Lfe decoder (not shown in any of Figures 1 and 3, 4, 5, 6, 7, 8, 9, 10 or 11) that receives the audio bitstreams or a portion thereof and which is arranged to insert the prepared additional channel(s) into the reconstructed audio signal; the insertion point may be immediately upstream of the sample rate converter 109.
[0062] Figure 2 illustrates two mono decoding modes of the audio processing system shown in Figure 1 with corresponding labeling. More precisely, Figure 2 shows those system components that are active during decoding and that form the processing path to prepare the reconstructed (mono) audio signal based on the audio bit stream. It is noted that the processing paths in Figure 2 additionally include a final signal limiting component ("Lim") arranged to scale down signal values in order to satisfy a no-clip condition. The upper decoding mode in Figure 2 uses high frequency reconstruction, while the lower decoding mode in Figure 2 decodes a fully encoded waveform channel. In lower decoding mode, therefore, the high frequency reconstruction ("HFR") component has been replaced by a delay stage ("Delay") that incurs a delay equal to the algorithmic delay of the HFR component.
[0063] As the lower part of Figure 2 suggests, it is also possible to bypass the processing stage ("QMF", "Delay", "DRC", "QMF-1") together; this may be applicable when no dynamic range control (DRC) processing is performed on the signal. Bypassing the processing stage eliminates any potential signal deterioration due to QMF analysis followed by QMF synthesis which may involve imperfect reconstruction. The bypass line includes a second delay line stage configured to delay the signal by an amount equal to the total (algorithmic) delay of the processing stage.
[0064] Figure 3 illustrates two parametric stereo decoding modes. In both modes, stereo channels are obtained by applying high frequency reconstruction to a first channel, producing an uncorrelated version of it using a decorrelator ("D"), and then forming a linear combination of both in order to obtain a stereo signal. The linear combination is computed by the boost mixing stage ("Augmentation Mixing") arranged upstream of the DRC stage. In one of the modes, the one shown in the lower portion of the drawing, the audio bit stream additionally transmits the low-frequency encoded waveform content to both channels (area shaded by "\"). The implementation details of the aforementioned method are described by Figures 7 to 10 and corresponding sections of the present description.
[0065] Figure 4 illustrates a decoding mode in which the audio processing system processes a full waveform encoded stereo signal with discrete mode encoded channels. This is a high bitrate stereo mode. If DRC processing is not considered necessary, the processing stage can be bypassed together using two branch lines with the respective delay stages shown in Figure 4. Preferably, the delay stages incur a delay equal to that of the processing when in other decoding modes, so that mode switching can occur continuously with respect to signal content.
[0066] Figure 5 illustrates a decoding mode in which the audio processing system provides a five-channel signal by parametrically upmixing a three-channel downmixing signal after applying spectral band replication. As mentioned earlier, it is advantageous to encode two of the channels (area hatched by "/ / /") together (e.g. as an even channel element) and the audio processing system is preferably designed to handle a stream of bits with this property. To this end, the audio processing system comprises two receiving sections, where the lower one is configured to decode the even channel element and the upper one to decode the remaining channel (area shaded by "\"). After high frequency reconstruction in the QMF domain, each channel of the even channel is decorrelated separately, after which a first stage of boost mixing forms a first linear combination of a first channel and an uncorrelated version of it and a second one. boost mixing stage forms a second linear combination of the second channel and an uncorrelated version of it. The implementation details of this processing are described by Figures 7 to 10 and the corresponding sections of the present description. The total of five channels undergo DRC processing before QMF synthesis. AUDIO MODE CODING
[0067] Figure 6 is a generalized block diagram of an audio processing system 100 that receives an audio bitstream encoded P and with a reconstructed audio signal shown as a pair of L,R baseband stereo signals. in Figure 6, as its final emission. In this example, it will be assumed that the bitstream P comprises audio data with two quantized transform-encoded channels. The audio processing system 100 can receive the audio bit stream P from a communication network, a wireless receiver, or a memory (not shown). The output of system 100 may be provided to speakers for playback or may be re-encoded to the same or a different format for further transmission over a communication network or wireless link, or for storage in memory.
[0068] The audio processing system 100 comprises a decoder 108 for decoding the bit stream P into quantized spectral coefficients and controlling data. A front-end component 110, the structure of which will be discussed in more detail below, dequantizes these spectral coefficients and provides a time domain representation of an intermediate audio signal to be processed by the processing stage 120. The intermediate audio signal it is transformed by means of analysis filters 122L, 122R into a second frequency domain, different from the signal associated with the aforementioned coding transform; the second frequency domain representation may be a quadrature mirror (QMF) filter representation, in which case analysis filterbanks 122L, 122R may be provided as QMF filterbanks. Downstream of the analysis filter banks 122L, 122R, a spectral band replication (SBR) module 124 responsible for high frequency reconstruction and a dynamic range control (DRC) module 126 processes the second frequency domain representation of the signal. intermediate audio. Synthesis filterbanks 128L, 128R produce a time domain representation of the audio signal then processed downstream thereof. As the person skilled in the art will understand after studying this disclosure, neither the spectral band replication module 124 nor the dynamic range control module 126 are necessary elements of the invention; on the contrary, an audio processing system, according to a different exemplary embodiment, may include additional or alternative modules within the processing stage 120. Downstream of the processing stage 120, a sample rate converter 130 is operable to adjust the sample rate of the audio signal processed to a desired audio sample rate, such as 44.1 kHz or 48 kHz, for which the desired playback equipment (not shown) is designed. It is recognized in the art to design a 130 sample rate converter with a small amount of artifacts in the output. Sample rate converter 130 may be disabled at times when sample rate conversion is not required, i.e., when processing stage 120 provides a processed audio signal that already has the target sampling frequency. An optional signal limiting module 140 disposed downstream of sample rate converter 130 is configured to limit baseband signal values as needed in accordance with a no-clip condition which, again, may be chosen in view of privately designed breeding equipment.
[0069] As shown in the lower portion of Figure 6, the front-end component 110 comprises a dequantization stage 114 that can be operated in one of several modes with different block sizes and an internal transform stage 118L, 118R, which can operate on different block sizes as well. Preferably, the changing modes of the dequantization stage 114 and the internal transform stage 118L, 118R are synchronous, so that the block size is compatible at all times. Upstream of these components, the front-end component 110 comprises a demultiplexer 112 for separating the quantized spectral coefficients from the control data; typically forwards control data to internal transform stage 118L, 118R and forwards quantized spectral coefficients (and optionally, control data) to dequantization stage 114. Dequantization stage 114 performs a mapping from a table of quantization indices (typically represented as integers) to a table of spectral coefficients (typically represented as floating point numbers). Each quantization index is associated with a quantization level (or reconstruction point). On the basis that the audio bitstream was prepared using non-uniform quantization as discussed above, the association is not unique unless it is specified which frequency band the quantization index refers to. Put differently, the dequantization process may follow a different codebook for each frequency band and the set of codebooks may vary as a function of frame length and/or bit rate. In Figure 6, this is schematically illustrated, where the vertical axis represents frequency and the horizontal axis represents the allocated amount of encoding bits per unit frequency. It is observed that the frequency bands are typically wider for higher frequencies and end at one half of the internal sampling frequency fi. The internal sampling frequency may be mapped to a numerically different physical sampling frequency as a result of resampling in the sample rate converter 130; for example, a 4.3% boost sampling will map fi = 46.034 kHz to the approximate physical frequency of 48 kHz and will increase the lower frequency band limits by the same factor. As Figure 6 further suggests, the encoder that prepares the audio bit stream typically allocates different amounts of encoding bits to different frequency bands, according to the complexity of the encoded signal and expected sensitivity variations of human auditory sensitivity.
[0070] Quantitative data characterizing the operating modes of the audio processing system 100, and in particular the front-end component 110, are provided in table 1.

[0071] The three columns highlighted in table 1 contain values of controllable quantities, while the remaining quantities can be considered as dependent on them. Furthermore, it is observed that the ideal values of the resampling factor (SRC) are (24/25) x (1000/1001) » 0.9560. 24/25 = 0.96 and 1000/1001 » 0.9990. The SRC factor values listed in Table 1 are rounded as they are the frame rate values. The resampling factor of 1000 is exact and corresponds to the SRC 130 which is either off or completely absent. In the exemplary embodiments, the audio processing system 100 is operable in at least two modes with different frame lengths, one or more of which may match the entries in Table 1.
[0072] Modes a-d, in which the frame length of the front-end component is set to 1920 samples, are used to handle frame rates (audio) of 23,976; 24,000; 24,975 and 25,000 Hz, selected to be exactly compatible with video frame rates of widespread encoding formats. Due to the different frame lengths, the internal sampling frequency (frame rate x frame length) will vary from about 46,034 kHz to 48,000 kHz in a-d modes; assuming critical sampling and frequency bins evenly separated, this will match bin width values in the range 11.988 Hz to 12.500 Hz (half internal sampling frequency/frame length). Due to the fact that the variation in internal sampling frequencies is limited (it is about 5 %, as a consequence of the range of variation of the frame rates being about 5 %), it is understood that the audio processing system 100 will deliver reasonable output quality in all four ad modes despite the inexact match of the physical sampling frequency for which the input audio bitstream was prepared.
[0073] Continuing downstream of the front-end component 110, the analysis filter bank (QMF) 122 has 64 bands or 30 samples per QMF frame, in all modes a-d. In physical terms this will correspond to a slightly varying width of each analysis frequency band, but again the variation is limited so it can be neglected; in particular, the SBR and DRC processing modules 124, 126 can be current mode agnostic without detriment to output quality. The SRC 130, however, is mode dependent and will use a specific resampling factor chosen to match the ratio of the target external sampling frequency and the internal sampling frequency to ensure that each frame of the processed audio signal contains multiple samples corresponding to a target external sampling frequency of 48 kHz in physical units.
[0074] In each of modes a-d, the audio processing system 100 will exactly match the video frame rate and external sampling frequency. Audio processing system 100 can then handle audio portions of multimedia bitstreams T1 and T2, wherein audio frames A11, A12, A13, . .; A22, A23, A24, . .and video frames V11, V12, V13, . .; V22, V23, V24 coincide in time within each stream. Therefore, it is possible to improve the timing of the T1, T2 streams by deleting an audio frame and an associated video frame in the main stream. Alternatively, an audio frame and a video frame associated with the delayed stream are duplicated and inserted close to the original position, possibly in combination with interpolation measures to reduce noticeable artifacts.
[0075] The e and f modes, designed to handle 29.97 Hz and 30.00 Hz frame rates, can be discerned as a second subgroup. As explained earlier, the quantization of audio data is adapted (or optimized) to an internal sampling frequency of about 48 kHz. Consequently, because each frame is shorter, the frame length of the front-end component 110 is set to the smallest value samples 1536, so that the internal sampling frequency of about 46,034 and 46,080 kHz is achieved. If the analysis filterbank 122 is mode-independent with 64 frequency bands, each QMF frame will contain 24 samples.
[0076] Similarly, frame rates at or around 50 Hz and 60 Hz (corresponding to twice the refresh rate in standard television formats) and 120 Hz are converted to gai modes (960 frame length samples ), jak modes (768 frame length samples) and l mode (384 frame length samples), respectively. It is observed that the internal sampling frequency remains close to 48 kHz in each case, so any physical-acoustic tuning of the quantization process by which the audio bitstream was produced will remain at least approximately valid. The respective QMF frame lengths in a filterbank with 64 bands will be samples15, 12, and 6.
[0077] As mentioned, the audio processing system 100 may be operable to subdivide audio frames into smaller subframes; one reason for this may be to capture audio transients more efficiently. For a sampling frequency of 48 kHz and the settings given in table 1, tables 2 through 4 below show the bin widths and frame lengths that result from subdividing into 2, 4, 8, and 16 subframes. The settings according to Table 1 are believed to achieve an advantageous balance of time and frequency resolution.

[0078] Decisions regarding the subdivision of a frame can be made as part of the audio bitstream preparation process, as in an audio encoding system (not shown).
[0079] As illustrated by the m-mode in Table 1, the audio processing system 100 can additionally be enabled to operate at an increased external sampling frequency of 96 kHz and with 128 QMF bands, corresponding to 30 samples per frame of QMF Due to the fact that the external sampling frequency coincides with the internal sampling frequency, the SRC factor is unity, which corresponds to no resampling required. MULTI-CHANNEL ENCODING
[0080] As used in this section, an audio signal can be a pure audio signal, an audio portion of an audiovisual or multimedia signal, or any of these in combination with metadata.
[0081] As used in this section, reduction mixing of a plurality of signals means combining the plurality of signals, for example, forming linear combinations, so that a lower number of signals is obtained. The reverse operation for downmixing is called boost mixing, that is, performing an operation on a lower number of signals to obtain a greater number of signals.
[0082] Figure 7 is a generalized block diagram of a decoder 100 in a multichannel audio processing system to reconstruct M encoded channels. The decoder 100 comprises three conceptual parts 200, 300, 400 which will be explained in greater detail in conjunction with Figures 17 to 19 below. In the first conceptual part 200, the encoder receives N waveform encoded downmix signals and M waveform encoded signals representing the multichannel audio signal to be decoded, where 1<N<M. In the illustrated example, N is set to 2. In the second conceptual part 300, the M waveform encoded signals are downmixed and combined with the N waveform encoded reduction mixing signals. High frequency reconstruction (HFR) is performed for the combined reduction mix signals. In the third conceptual part 400, the high frequency reconstructed signals are upmixed and the M waveform encoded signals are combined with the upmix mixing signals to reconstruct M encoded channels.
[0083] In the exemplary embodiment described in conjunction with Figures 8 to 10, the reconstruction of a 5.1 encoded surround sound is described. It can be seen that the low frequency effect signal is not mentioned in the described embodiment or in the drawings. This does not mean that any low-frequency effects are overlooked. Low frequency effects (Lfe) are added to the 5 rebuilt channels in any suitable manner well known to a person skilled in the art. It can also be seen that the decoder described is equally suitable for other types of surround sound encoded as 7.1 or 9.1 surround sound.
[0084] Figure 8 illustrates the first conceptual part 200 of the decoder 100 in Figure 7. The decoder comprises two receive stages 212, 214. In the first receive stage 212, a bit stream 202 is decoded and dequantized into two waveform encoded reduction mix signals 208a to 208b. The two waveform encoded reduction mix signals 208a to 208b each comprise spectral coefficients corresponding to frequencies between a first cutoff frequency ky and a second cutoff frequency kx.
[0085] In the second receive stage 214, the bit stream 202 is decoded and dequantized into five waveform encoded signals 210a to 210e. Each of the five waveform encoded reduction mix signals 210a to 210e comprises spectral coefficients corresponding to frequencies up to the first cutoff frequency kx.
[0086] By way of example, signals 210a to 210e comprise two even channel elements and a single channel element for the center channel. The even channel elements can be, for example, a combination of the front left and front surround signal and a combination of the front right and front surround signal. An additional example is a combination of front left and front right signals and a combination of surround left signal and surround right signal. These even channel elements can be, for example, encoded in a sum and difference format. All five signals 210a to 210e can be multi-window transform encoded with independent windows and still can be decoded by the decoder. This can allow for improved encoding quality and therefore improved quality of the decoded signal.
[0087] By way of example, the first cutoff frequency ky is 1.1 kHz. By way of example, the second cut-off frequency kx is within the range of 5.6 to 8 kHz. It should be noted that the first cut-off frequency ky may vary even on an individual signal basis, i.e. the encoder may detect that a signal component in a specific output signal may not be reliably reproduced by the downmix stereo mixing signals 208a to 208b and may, for that particular moment, increase the bandwidth, i.e., the first ky cut-off frequency, of the relevant waveform encoded signal, i.e., 210a to 210e, to perform proper waveform encoding of the signal component. As will be described later in its description, the remaining stages of encoder 100 typically operate in the quadrature mirror filter (QMF) domain. For that reason, each of the signals 208a to 208b, 210a to 210e received by the first and second receive stages 212, 214 that are received in a modified discrete cosine transform (MDCT) form are time domain transformed by applying an inverse MDCT 216. Then, each signal is transformed back to the frequency domain by applying a QMF transform 218.
[0055] In Figure 9, the five waveform encoded signals 210 are downmixed TO two downmixing signals 310, 312 comprising spectral coefficients corresponding to frequencies up to the first cutoff frequency ky in a downmixing stage 308. These reduction mix signals 310, 312 can be formed by performing a reduction mix on the multichannel low-pass signals 210a to 210e using the same reduction mixing scheme as was used in an encoder to create the two reduction signals. reduction mixes 208a to 208b shown in Figure 8.
[0056] The two new reduction mix signals 310, 312 are combined in a first combination stage 320, 322 with the corresponding reduction mix signal 208a to 208b to form combined reduction mix signals 302a to 302b. Each of the combined reduction mixing signals 302a to 302b therefore comprises spectral coefficients corresponding to frequencies up to the first cut-off frequency ky originating from the reduction mixing signals 310, 312 and spectral coefficients corresponding to frequencies between the first frequency ky and the second cutoff frequency kx originating from the two waveform encoded downmix signals 208a to 208b received at the first receive stage 212 (shown in Figure 8 ).
[0057] The encoder additionally comprises a high frequency reconstruction (HFR) stage 314. The HFR stage is configured to extend each of the two combined reduction mix signals 302a to 302b from the combine stage to a frequency range above the second cutoff frequency kx performing high frequency reconstruction. The performed high-frequency reconstruction may, according to some modalities, comprise: performing spectral band replication, SBR. High frequency reconstruction can be done using the high frequency reconstruction parameters that can be received by the HFR 314 stage in any suitable way.
[0058] The output of the high frequency reconstruction stage 314are two signals 304a through 304b comprising the downmix mixing signals 208a through 208b with the HFR extension 316, 318 applied. As described above, the HFR stage 314 performs high frequency reconstruction based on the frequencies present in the input signal 210a to 210e from the second receive stage 214 (shown in Figure 8) combined with the two downmix mixing signals 208a to 208b. In somewhat simplified terms, the range of HFR 316, 318 comprises parts of the spectral coefficients of the downmix mixing signals 310, 312 that have been copied to the range of HFR 316, 318. Consequently, the parts of the five signals encoded in the form of a wave 210a to 210e will appear in the range of HFR 316, 318 of output 304 of stage of HFR 314.
[0059] It should be noted that the reduction mixing in the reduction mixing stage 308 and the combining in the first mixing stage 320, 322 before the high frequency reconstruction stage 314, can be done in the time domain, i.e. that is, after each signal has been transformed into the time domain by applying an inverse modified discrete cosine transform (MDCT) 216 (shown in Figure 8). However, since waveform encoded signals 210a to 210e and waveform encoded downmix signals 208a to 208b can be encoded by a waveform encoder using multi-window transforms overlapped with windows independently, signals 210a to 210e and 208a to 208b may not be continuously combined in a time domain. Therefore, a better controlled scenario is achieved if at least the blending in the first blending stage 320, 322 is performed in the QMF domain.
[0060] Figure 10 illustrates the third and final conceptual part 400 of the encoder 100. The output 304 of the HFR stage 314 constitutes the input to a boost mixing stage 402. The boost mixing stage 402 creates an output with five signals 404a to 404e performing parametric boost mixing on the extended frequency signals 304a to 304b. Each of the five boost mix signals 404a to 404e correspond to one of the five channels encoded in 5.1 encoded surround sound for frequencies above the first ky cutoff frequency. In accordance with an exemplary parametric boost mixing procedure, the boost mixing stage 402 first receives parametric mixing parameters. The boost mix stage 402 then generates uncorrelated versions of the two combined cut down mix frequency extended signals 304a through 304b. The boost mixing stage 402 subjects the two combined reduction mixing frequency extended signals 304a to 304b and the uncorrelated versions of the two combined reducing mixing frequency extended signals 304a to 304b to a matrix operation, in which the parameters of the matrix operation are provided by the boost mix parameters. Alternatively, any other parametric augmentation mixing procedure known in the art can be applied. Applicable parametric enhancement mixing procedures are described, for example, in "MPEG Surround-The ISO/MPEG Standard for Efficient and Compatible Multichannel Audio Coding" (Herre et al., Journal of the Audio Engineering Society, Vol. 11,2008 November).
[0061] Output 404a to 404e of boost mixing stage 402 does not comprise frequencies below the first cutoff frequency ky. There are remaining spectral coefficients corresponding to frequencies up to the first cut-off frequency ky in the five waveform encoded signals 210a to 210e that have been delayed by a delay stage 412 to match the timing of the boost mixing signals 404.
[0062] The encoder 100 additionally comprises a second combining stage 416, 418. The second combining stage 416, 418 is configured to combine the five boost mixing signals 404a to 404e with the five waveform encoded signals. 210a to 210e that were received by the second receive stage 214 (shown in Figure 8).
[0063] It can be seen that any Lfe signal present can be added as a separate signal to the resulting combined signal 422. Each of the signals 422 is time domain transformed by applying an inverse QMF transform 420. The output of the trans -formed of inverse QMF 414 is the fully decoded 5.1 channel audio signal.
[0064] Figure 11 illustrates a decoding system 100' which is a modification of the decoding system 100 of Figure 7. The decoding system 100' has conceptual parts 200', 300', and 400' corresponding to the conceptual parts 100 , 200, and 300 of Figure 16. The difference between the decoding system 100' of Figure 11 and the decoding system of Figure 7 is that there is a third receive stage 616 in the conceptual part 200' and an interval stage 714 in the third conceptual part 400'.
[0065] Receive third stage 616 is configured to receive an additional encoded waveform signal. The additional waveform encoded signal comprises spectral coefficients corresponding to a subset of the frequencies above the first cutoff frequency. The additional waveform encoded signal can be transformed into the time domain by applying an inverse MDCT 216. It can be transformed back into the frequency domain by applying a QMF transform 218.
[0066] It should be understood that the additional waveform encoded signal may be received as a separate signal. However, the additional waveform encoded signal may also form part of one or more of the five waveform encoded signals 210a to 210e. In other words, the additional waveform encoded signal can be encoded together with one or more of the five waveform encoded signals 201a to 201e, for example using the same MCDT transform. If so, the third receive stage 616 corresponds to the second receive stage, i.e. the additional waveform encoded signal is received along with the five waveform encoded signals 210a to 210e through the second receive stage 214.
[0067] Figure 12 illustrates the third conceptual part 300' of the decoder 100' of Figure 11 in more detail. The additional waveform encoded signal 710 is inserted into the third conceptual part 400' in addition to the extended high frequency reduction mixing signals 304a to 304b and the five waveform encoded signals 210a to 210e. In the illustrated example, the additional waveform encoded signal 710 corresponds to the third channel of the five channels. Additional waveform encoded signal 710 additionally comprises spectral coefficients corresponding to a frequency range starting at the first cut-off frequency ky. However, the shape of the subset of the frequency range above the first cut-off frequency covered by the additional waveform encoded signal 710 can of course vary in different ways. It should also be noted that a plurality of waveform encoded signals 710a to 710e may be received, wherein different waveform encoded signals may correspond to different output channels. The subset of the frequency range covered by the plurality of additional waveform encoded signals 710a to 710e may vary among the different signals among the plurality of additional waveform encoded signals 710a to 710e.
[0068] The additional waveform encoded signal 710 may be delayed by a delay stage 712 in order to match the timing of the boost mix signals 404 that are output from the boost mix stage 402. The mix signals 404 and the additional waveform encoded signal 710 are input to an interleaving stage 714. The interleaving stage 714 interleaves, i.e., combines the boost mixing signals 404 with the additional waveform encoded signal 710 to generate an interleave signal 704. In the present example, the slot stage 714 interleaves the third boost mix signal 404c with the additional waveform encoded signal 710. Interleaving can be performed by adding the two signals together. However, typically interleaving is performed by replacing the boost mixing signals 404 with the additional waveform encoded signal 710 in the frequency range and time range where the signals overlap.
[0069] Interleave signal 704 is input to the second combination stage, 416, 418, where it is combined with waveform encoded signals 201a to 201e to generate an output signal 722 in the same manner as described with reference to Fig. 19. It should be noted that the order of the interleaving stage 714 and the second combining stage 416, 418 can be reversed so that the combination is performed before interleaving.
[0070] Also, in the situation where the additional waveform encoded signal 710 forms part of one or more of the five waveform encoded signals 210a to 210e, the second combination stage 416, 418 and the interleaving stage 714 can be combined in a single stage. Specifically, this combined stage would use the spectral content of the five waveform encoded signals 210a to 210e for frequencies up to the first ky cut-off frequency. For frequencies above the first cutoff frequency, the combined stage would use the boost mix signals 404 interleaved with the additional waveform encoded signal 710.
[0071] Interleaving stage 714 can operate under the control of a control signal. To this end, the decoder 100' may receive, for example, via the third receive stage 616, a control signal which indicates how to interleave the additional waveform encoded signal with one of the M upmix mixing signals. For example, the control signal may indicate the frequency range and time range in which the additional waveform encoded signal 710 is to be interleaved with one of the boost mix signals 404. For example, the frequency range and the time range can be expressed in terms of time/frequency blocks in which interleaving is to be done. Time/frequency blocks can be time/frequency blocks with respect to the time/frequency grid of the QMF domain where interleaving takes place.
[0072] The control signal can use vectors, such as binary vectors, to indicate the time/frequency blocks in which interleaving is to be done. Specifically, there may be a first vector related to a frequency direction that indicates the frequencies at which interleaving is to be performed. The indication can be made, for example, by indicating a logic for the corresponding frequency range in the first vector. It may also have a second vector related to a time direction that indicates the time intervals at which interleaving is to be performed. The indication can be made, for example, by indicating a logic for the corresponding time interval in the second vector. For this purpose, a time frame is typically divided into a plurality of time bands, so that the time indication can be performed on a subframe basis. By crossing the first and second vectors, a time/frequency matrix can be constructed. For example, the time/frequency matrix may be a binary matrix comprising logic for each time/frequency block for which the first and second vector indicate logic. The interleaving stage 714 may use the time/frequency matrix after performing the interleaving, for example, so that one or more of the boost mixing signals 704 is replaced by the additional waveform encoded signal 710 for the time blocks. /frequency that are indicated, as a logic, in the matrix of time/frequency.
[0073] It is observed that the vectors can use schemes other than a binary scheme to indicate the time/frequency blocks in which the interleaving must be performed. For example, vectors could indicate by a first value such as a zero that no interleaving should be done and by a second value that interleaving should be done with respect to a particular channel identified by the second value. STEREO ENCODING
[0074] As used in this section, left and right encoding or encoding means that left (L) and right (R) stereo signals are encoded without performing any transform between the signals.
[0075] As used in this section, sum and difference coding or coding means that the sum M of the left and right stereo signals is encoded as one signal (sum) and the difference S between the left and right stereo signals is encoded as a sign (difference). Sum and difference coding can also be called mid-side coding. The relationship between the left and right form and the sum and difference form is M = L + R and S = L - R. It can be seen that different normalizations or scaling are possible when the left and right stereo signals are trans -formed in the form of sum and difference and vice versa, as long as the transformation is compatible in both directions. In this reveal, M = L + R and S = L - R is mainly used, but a system using a different scaling, e.g. M = (L + R)/2 and S = (L - R)/2 works equally well.
[0076] As used in this section, downmix (dmx/comp) complementary conversion or encoding means subjecting the left and right stereo signals to matrix multiplication depending on an a-weighted parameter before encoding. dmx/comp encoding may also be called dmx/comp/a encoding. The relationship between the reduction mix complementary coding form, the left and right form, and the sum and difference form are typically dmx = L + R = M, and comp = (1 - a)L - (1 + a )R = -aM + S. Notably, the reduction mix signal in the complementary reduction mix representation is equivalent to the sum signal M of the sum and difference representation.
[0077] As used in this section, an audio signal can be a pure audio signal, an audio portion of an audiovisual or multimedia signal, or any of these in combination with metadata.
[0078] Figure 13 is a generalized block diagram of a decoding system 100 comprising three conceptual parts 200, 300, 400 which will be explained in greater detail in conjunction with Figures 14 to 16 below. In the first conceptual part 200, a stream of bits is received and decoded into a first and a second signal. The first signal comprises both a waveform encoded first signal comprising spectral data corresponding to frequencies up to a first cutoff frequency and a waveform encoded reduction mix signal comprising spectral data corresponding to frequencies above the first cutoff frequency. The second signal only comprises a second waveform encoded signal comprising spectral data corresponding to frequencies up to the first cut-off frequency.
[0079] In the second conceptual part 300, if the waveform encoded parts of the first and second signals are not in a sum and difference form, e.g. in an M/S form, the waveform encoded parts waveform of the first and second signals are transformed into sum and difference form. After that, the first and second signals are transformed into the time domain and then into the quadrature mirror filter, QMF, domain. In the third conceptual part 400, the first signal is high-frequency (HFR) reconstructed. Both the first and second signals are upmixed to create a left stereo signal output and a right stereo signal output that have spectral coefficients corresponding to the entire frequency band of the encoded signal that is decoded by decoding system 100.
[0080] Figure 14 illustrates the first conceptual part 200 of the decoding system 100 in Figure 13. The decoding system 100 comprises a receive stage 212. At the receive stage 212, a bit stream frame 202 is decoded and dequantized in a first signal 204a and a second signal 204b. The bitstream frame 202 corresponds to a time frame of the two audio signals that are decoded. The first signal 204a comprises a first waveform encoded signal 208 comprising spectral data corresponding to frequencies up to a first cutoff frequency ky and a waveform encoded reduction mix signal 206 comprising spectral data corresponding to frequencies above the first ky cutoff frequency. By way of example, the first ky cut-off frequency is 1.1 kHz.
[0081] In accordance with some embodiments, the waveform encoded reduction mix signal 206 comprises spectral data corresponding to frequencies between the first cutoff frequency ky and a second cutoff frequency kx. By way of example, the second cut-off frequency kx is within the range of 5.6 to 8 kHz.
[0082] The received first and second waveform encoded signals 208, 210 may be waveform encoded in a left and right manner, a sum and difference form and/or a reduction mix complementary encoding form, in that the complementary signal depends on a weighted parameter to which it has adaptive signal. The waveform encoded reduction mix signal 206 corresponds to a reduction mix suitable for parametric stereo which, as stated above, corresponds to a summing form. However, signal 204b has no content above the first cutoff frequency ky. Each of the signals 206, 208, 210 is represented in a modified discrete cosine transform (MDCT) domain.
[0083] Figure 15 illustrates the second conceptual part 300 of the decoding system 100 in Figure 13. The decoding system 100 comprises a mixing stage 302. The model of the decoding system 100 requires that the input to the high reconstruction stage frequency, which will be described in more detail below, needs to be in a summation format. Accordingly, the mixing stage is configured to verify that the first and second waveform encoded signals 208, 210 are in a sum and difference form. If the first and second waveform encoded signals 208, 210 are not in a sum and difference form for all frequencies up to the first cutoff frequency ky, the mixing stage 302 will transform the entire encoded signal into a waveform. wave 208, 210 in a sum and difference form. If at least a subset of the frequencies of the input signals 208, 210 to the mixing stage 302 is in a form of downmix complementary coding, the weighted parameter a is required as an input to the mixing stage 302. It can be seen that the input signals 208, 210 may comprise several subsets of frequencies encoded in a form of downmix complementary coding and that, in that case, each subset need not be encoded using the same value of the weighted parameter a. . In this case, several a-weighted parameters are needed as an input to mixing stage 302.
[0084] As mentioned above, the mixing stage 302 always outputs a sum and difference representation of the input signals 204a to 204b. In order to be able to transform signals represented in the MDCT domain into the sum and difference representation, the window of the MDCT encoded signals need to be the same. This implies that if the first and second waveform encoded signals 208, 210 are in an L/R form or a downmix complementary encoding form, the window for signal 204a and the window for signal 204b cannot be independent.
[0085] Consequently, if the first and second waveform encoded signals 208, 210 are in a sum and difference form, the window for signal 204a and the window for signal 204b can be independent.
[0086] After mixing stage 302, the sum and difference signal is transformed into the time domain by applying an inverse modified discrete cosine transform (MDCT-1) 312.
[0087] The two signals 304a to 304b are analyzed with two banks of QMF 314. Since the reduction mix signal 306 does not comprise the lower frequencies, there is no need to analyze the signal with a bank of Nyquist filters to increase the resolution of frequency. This can be compared to systems where the reduction mix signal comprises low frequencies, e.g. conventional parametric stereo decoding such as MPEG-4 parametric stereo in those systems, the reduction mix signal needs to be analyzed with the filter bank Nyquist in order to increase the frequency resolution beyond what is achieved by a QMF bank and more compatible with the frequency selectivity of the human auditory system, as represented by the Bark frequency scale.
[0088] Output signal 304 of QMF banks 314 comprises a first signal 304a which is a combination of a waveform encoded sum signal 308 comprising spectral data corresponding to frequencies up to the first cut-off frequency ky and the waveform encoded reduction mix 306 comprising spectral data corresponding to frequencies between the first cutoff frequency ky and the second cutoff frequency kx. Output signal 304 further comprises a second signal 304b which comprises a waveform encoded difference signal 310 comprising spectral data corresponding to frequencies up to the first cut-off frequency ky. Signal 304b has no content above the first cutoff frequency ky.
[0089] As will be described later, a high frequency reconstruction stage 416 (shown in conjunction with Figure 16) uses the lower frequencies, i.e. the first waveform encoded signal 308 and the encoded reduction mix signal in waveform 306 from the output signal 304, to reconstruct frequencies above the second cut-off frequency kx. It is advantageous that the signal on which the high frequency reconstruction stage 416 operates is a similar type signal at lower frequencies. From this perspective, it is advantageous to have the mixing stage 302 to always output a sum and difference representation of the first and second waveform encoded signal 208, 210 as this implies that the first waveform encoded signal 308 and the waveform encoded reduction mix signal 306 of the first output signal 304a are similar in character.
[0090] Figure 16 illustrates the third conceptual part 400 of the decoding system 100 in Figure 13. The high frequency reconstruction (HRF) stage 416 which extends the downmix mixing signal 306 from the first input signal 304a to a range frequency above the second cutoff frequency kx performs high frequency reconstruction. Depending on the configuration of the HFR 416 stage, the input to the HFR 416 stage is either the entire signal 304a or just the reduction mix signal 306. High frequency reconstruction is performed using high frequency reconstruction parameters that can be received by high frequency reconstruction stage 416 in any suitable manner. According to one embodiment, the performed high frequency reconstruction comprises performing spectral band replication, SBR.
[0091] The output of the high frequency reconstruction stage 314 is a signal 404 comprising the reduction mix signal 406 with the SBR extension 412 applied. High frequency reconstructed signal 404 and signal 304b are fed into a boost mixing stage 420 to generate a left stereo signal L and a right stereo signal R 412a to 412b. For the spectral coefficients corresponding to frequencies below the first cutoff frequency ky, the boost mixing comprises performing an inverse sum and difference transform of the first and second signals 408, 310. This simply means going from a mid-side representation to a representation of left and right, as outlined earlier. For spectral coefficients corresponding to frequencies above the first cut-off frequency ky, the reduction mix signal 406 and the SBR extension 412 are fed through a decorrelator 418. The reduction mix signal 406 and the SBR extension 412 and the uncorrelated version of the downmix signal 406 and the SBR extension 412 are upmixed using parametric mixing parameters to reconstruct the left and right channels 416, 414 to frequencies above the first ky cutoff frequency. Any parametric augmentation mixing procedure known in the art can be applied.
[0092] It should be noted that in the above exemplary modality 100 of the encoder, shown in Figures 13 to 16, high frequency reconstruction is required, since the first received signal 204a only comprises spectral data corresponding to frequencies up to the second cutoff frequency kx. In additional embodiments, the first received signal comprises spectral data corresponding to all frequencies of the encoded signal. Under this modality, high-frequency reconstruction is not required. The person skilled in the art understands how to adapt the exemplary encoder 100 in this case.
[0093] Figure 17 shows by way of example a generalized block diagram of a coding system 500, according to one embodiment.
[0094] In the encoding system, a first and second signal 540, 542 to be encoded are received by a receive stage (not shown). These signals 540, 542 represent a time frame of left 540 and right 542 stereo audio channels. Signals 540, 542 are represented in the time domain. The coding system comprises a transform stage 510. Signals 540, 542 are transformed into a sum and difference form 544, 546 at transform stage 510.
[0095] The coding system which further comprises a waveform coding stage 514 configured to receive the first and second transformed signals 544, 546 from the transform stage 510. The shape coding stage waveform typically operates in an MDCT domain. For that reason, the transformed signals 544, 546 are subjected to an MDCT transform 512 before the waveform encoding stage 514. In the waveform encoding stage, the first and second transformed signals 544, 546 are encoded. waveform in a first and a second waveform encoded signal 518, 520, respectively.
[0096] For frequencies above a first cutoff frequency ky, the waveform encoding stage 514 is configured to waveform encode the first transformed signal 544 into a waveform code signal 552 of the first encoded signal waveform 518. Waveform encoding stage 514 can be configured to set the second waveform encoded signal 520 to zero above the first ky cutoff frequency or not to encode such frequencies. For frequencies above the first cut-off frequency ky, the waveform encoding stage 514 is configured to waveform encode the first transformed signal 544 into a waveform encoded signal 552 of the first waveform encoded signal 518 .
[0097] For frequencies below the first cutoff frequency ky, a decision is made at the waveform encoding stage 514 on which type of stereo encoding to use for the two signals 548, 550. Depending on the characteristics of the transformed signals 544, 546 below the first cutoff frequency ky, different decisions can be made for different subsets of the coded signal in waveform 548, 550. The coding can either be Left/Right coding, Mid/Side coding, i.e. sum coding and difference or encoding dmx/comp/a. If signals 548, 550 are waveform encoded by sum and difference encoding at waveform encoding stage 514, waveform encoded signals 518, 520 can be encoded using overlapping multiple-window transforms. with independent windows for signals 518, 520, respectively.
[0098] An exemplary first cutoff frequency ky is 1.1kHz, but this frequency may vary depending on the bit rate of the stereo audio system or depending on the characteristics of the audio being encoded.
[0099] At least two signals 518, 520 are output from the waveform encoding stage 514. In this case, one or several subsets or the entire frequency band of signals below the first cut-off frequency ky is encoded in a reduction/complementary mixing form that performs a matrix operation, depending on the a-weighted parameter, this parameter is also output as a 522 signal. In the event that multiple subsets are encoded in a reduction/complementary mixing form, each subset does not need to be encoded using the same value as the a-weighted parameter. In this case, several weighted parameters are output as signal 522.
[00100] These two or three signals 518, 520, 522 are encoded and quantized 524 into a single composite signal 558.
[00101] In order to be able to reconstruct the spectral data of the first and second signals 540, 542 for frequencies above the first cut-off frequency on a decoder side, the parametric stereo parameters 536 need to be extracted from the signals 540, 542. For this Finally, the encoder 500 comprises a parametric stereo (PS) conversion stage 530. The PS conversion stage 530 typically operates in a QMF domain. Therefore, before being inserted into the PS conversion stage 530, the first and second signals 540, 542 are transformed into a QMF domain by a QMF parsing stage 526. The PS encoder stage 530 is adapted only for extract 536 parametric stereo parameters for frequencies above the first ky cutoff frequency.
[00102] It can be seen that the parametric stereo parameters 536 are reflecting as signal characteristics which are encoded, stereo and parametric. They are frequency selective, that is, each parameter among parameters 536 can correspond to a subset of the left or right input signal frequencies 540, 542. The PS conversion stage 530 calculates the parametric stereo parameters 536 and quantizes them both uniformly and non-uniformly. The parameters are calculated frequency selective as mentioned above, where the entire frequency range of input signals 540, 542 is divided, for example, into 15 parameter bands. These can be separated according to a model of the frequency resolution of the human auditory system, for example, a Bark scale.
[00103] In the exemplary embodiment of the encoder 500 shown in Figure 17, the waveform encoding stage 514 is configured to waveform encode the first transformed signal 544 to frequencies between the first cutoff frequency ky and a second frequency of kx cutoff and sets the first signal encoded in waveform 518 to zero above the second kx cutoff frequency. This can be done to further reduce the required baud rate of the audio system of which the Encoder 500 is a part. In order to be able to reconstruct the signal above the second cutoff frequency kx, the high frequency reconstruction parameters 538 need to be generated. According to this exemplary embodiment, this is done by downmixing two signals 540, 542, represented in the QMF domain, in a downmixing stage 534. The resulting reduction mixing signal which, by example, is equal to the sum of signals 540, 542, is subjected to high frequency conversion reconstruction in a high frequency reconstruction, HFR, conversion stage 532 in order to generate high frequency reconstruction parameters 538. Parameters 538 may include, for example, a spectral envelope of frequencies above the second cut-off frequency kx, noise addition information, etc. as is well known to the person skilled in the art.
[00104] An exemplary second cut-off frequency kx is 5.6 to 8 kHz, but this frequency may vary depending on the bit rate of the stereo audio system or depending on the characteristics of the audio being encoded.
[00105] The encoder 500 further comprises a stage that generates bitstream, i.e., a bitstream multiplexer 524. According to the exemplary embodiment of the encoder 500, the stage that generates bitstream is configured to receive the encoded and quantized signal 544 and the two parameter signals 536, 538. These are converted into a bit stream 560 by the stage generating bit stream 562 to be distributed further in the stereo audio system.
[00106] According to another embodiment, the waveform encoding stage 514 is configured to waveform encode the first transformed signal 544 for all frequencies above the first ky cutoff frequency. In this case, the HFR 532 conversion stage is not needed and consequently no high frequency reconstruction parameter 538 is included in the bit stream.
[00107] Figure 18 shows by way of example a generalized block diagram of an encoder system 600, according to another embodiment. VOICE MODE ENCODING
[00108] Figure 19a shows a block diagram of an exemplary transform-based speech encoder 100. The encoder 100 receives as an input a block 131 of transform coefficients (also referred to as an encoding unit). The transform coefficient block 131 may have been obtained by a transform unit configured to transform a sequence of samples of the audio input signal from the time domain into the transform domain. The transform unit can be configured to perform an MDCT. The transform unit can be part of a generic audio codec such as AAC or HE-AAC. This generic audio codec can make use of different block sizes, for example a long block and a short block. Exemplary block sizes are 1024 samples for a long block and 256 samples for a short block. Assuming a sampling rate of 44.1kHz and 50% overlap, a long block covers approximately 20 ms of the audio input signal and a short block covers approximately 5 ms of the audio input signal. Long blocks are typically used for stationary segments of the audio input signal and short blocks are typically used for transient segments of the audio input signal.
[00109] Speech signals can be considered as stationary in time segments of about 20 ms. In particular, the spectral envelope of a speech signal can be considered to be stationary in time segments of about 20 ms. In order to be able to produce meaningful statistics in the transform domain for these 20 ms segments, it may be useful to provide the transform-based speech coder 100 with short blocks 131 of transform coefficients (which have a length of, for example, 5 ms ). In doing so, a plurality of short blocks 131 can be used to produce statistics with respect to time segments of, for example, 20 ms (e.g., the time segment of a long block). Furthermore, this has the advantage of providing adequate time resolution for speech signals.
[00110] Therefore, the transform unit can be configured to provide short blocks 131 of transform coefficients in case a current segment of the audio input signal is classified as speaking. Encoder 100 may comprise a framing unit 101 configured to extract a plurality of blocks 131 of transform coefficients referred to as a set 132 of blocks 131. The configuration of blocks 132 may also be referred to as a frame. By way of example, the configuration 132 of blocks 131 may comprise four short blocks of 256 transform coefficients, thus converting approximately a 20 ms segment of the audio input signal.
[00111] Block configuration 132 can be provided for an envelope estimation unit 102. Envelope estimation unit 102 can be configured to determine an envelope 133 based on the block configuration 132. Envelope 133 may be based on root mean square (RMS) values of corresponding transform coefficients of the plurality of blocks 131 composed within the block configuration 132. A block 131 typically provides a plurality of transform coefficients (e.g., 256 transform coefficients) in a plurality of corresponding frequency slots 301 (see Figure 21a). The plurality of frequency compartments 301 may be grouped into a plurality of frequency bands 302. The plurality of frequency bands 302 may be selected based on physical-acoustic considerations. By way of example, frequency compartments 301 may be grouped into frequency bands 302 according to a logarithmic scale or a Bark scale. Envelope 134 that has been determined based on a current configuration 132 of blocks may comprise a plurality of energy values for the plurality of frequency bands 302, respectively. A particular energy value for a particular frequency band 302 can be determined based on the transform coefficients of blocks 131 of the configuration 132 that correspond to frequency compartments 301 that fall within the particular frequency band 302. The particular energy value can be determined based on the RMS value of these transform coefficients. As such, an envelope 133 for a current configuration 132 of blocks (referred to as a current envelope 133) may be indicative of an average envelope of blocks 131 of transform coefficients comprised within the current configuration 132 of blocks, or it may be indicative of a average block envelope 132 of transform coefficients used to determine envelope 133.
[00112] It should be noted that the current envelope 133 can be determined based on one or more additional blocks 131 of transform coefficients adjacent to the current configuration 132 of blocks. This is illustrated in Figure 20, where the current envelope 133 (indicated by the quantized current envelope 134) is determined based on blocks 131 of the current configuration 132 of blocks and based on block 201 of the set of blocks preceding the current configuration 132 of blocks. In the illustrated example, the current envelope 133 is determined based on five blocks 131. In view of the adjacent blocks when the current envelope is determined 133, a continuity of the envelopes of sets 132 of adjacent blocks can be guaranteed.
[00113] When the current envelope is determined 133, the transform coefficients of different blocks 131 can be weighted. In particular, the outermost blocks 201, 202 that are taken into account to determine the current envelope 133 may be weighted less than the remaining blocks 131. By way of example, the transform coefficients of the outermost blocks 201, 202 may be weighted with 0.5, where the transform coefficients of the other blocks 131 can be weighted with 1.
[00114] It should be noted that in a similar manner to considering blocks 201 from a preceding set 132 of blocks, one or more blocks (called blocks) from a directly following set 132 of blocks may be considered to determine the current envelope 133 .
[00115] Current envelope energy values 133 can be represented on a logarithmic scale (eg, on a dB scale). The current envelope 133 may be provided to an envelope quantizer unit 103 which is configured to quantize the energy values of the current envelope 133. The envelope quantizer unit 103 may provide a predetermined quantizer resolution, for example a resolution of 3dB The quantization indices of the envelope 133 may be provided as envelope data 161 within a bit stream generated by the encoder 100. Furthermore, the quantized envelope 134, i.e., the envelope comprising the quantized energy values of the envelope 133 can be provided to an interpolation unit 104.
[00116] The interpolation unit 104 is configured to determine an envelope for each block 131 of the current configuration 132 of blocks based on the current quantized envelope 134 and based on the previous quantized envelope 135 (which was determined for the configuration 132 of blocks directly preceding the current 132 block configuration). The operation of the interpolation unit 104 is illustrated in Figures 20, 21a and 21b. Figure 20 shows a sequence of blocks 131 of transform coefficients. The sequence of blocks 131 is grouped into successive sets 132 of blocks, where each set 132 of blocks is used to determine a quantized envelope, for example, the current quantized envelope 134 and the previous quantized envelope 135. Figure 21a shows examples of a quantized previous envelope 135 and a quantized current envelope 134. As noted above, the envelopes can be indicative of spectral energy 303 (eg, on a dB scale). The corresponding energy values 303 of the quantized previous envelope 135 and the quantized current envelope 134 for the same frequency band 302 can be interpolated (for example, using linear interpolation) to determine an interpolated envelope 136. In other words, the values The energy values 303 of a particular frequency band 302 may be interpolated to provide the energy value 303 of the interpolated envelope 136 within the particular frequency band 302.
[00117] It should be noted that the set of blocks for which the interpolated envelopes 136 are determined and applied may differ from the actual configuration 132 of blocks, based on which current quantized envelope 134 is determined. This is illustrated in Figure 20 which shows an offset set 332 of blocks which is offset compared to the current configuration 132 of blocks and which comprises blocks 3 and 4 of the previous set 132 of blocks (indicated by reference numerals 203 and 201, respectively) and blocks 1 and 2 of the current block configuration 132 (indicated by reference numerals 204 and 205 respectively). In fact, the interpolated envelopes 136 determined based on the current quantized envelope 134 and based on the previous quantized envelope 135 may have an increased relevance for the blocks of the shifted set 332 of blocks, compared to the relevance for the blocks of the current configuration 132 of blocks.
[00118] Therefore, the interpolated envelopes 136 shown in Figure 21b can be used to flatten the blocks 131 of the offset set 332 of blocks. This is shown by Figure 21b in combination with Figure 20. It can be seen that the interpolated envelope 341 of Figure 21b can be applied to the block 203 of Figure 20, the interpolated envelope 342 of Figure 21b can be applied to the block 201 of Figure 20 and the interpolated envelope 343 of Figure 21b can be applied to the block 204 of Figure 20 and that the interpolated envelope 344 of Figure 21b (which in the illustrated example corresponds to the actual quantized envelope 136) can be applied to the block 205 of Figure 20. As such , the configuration of blocks 132 for determining the current quantized envelope 134 may differ from the offset set 332 of blocks for which the interpolated envelopes 136 are determined and to which the interpolated envelopes 136 are applied (for design purposes). In particular, the quantized current envelope 134 can be determined using a given progress with respect to blocks 203, 201, 204, 205 of the offset set 332 of blocks that are to be flattened using the quantized current envelope 134. This is beneficial to from a continuity point of view.
[00119] Interpolation of energy values 303 to determine interpolated envelopes 136 is illustrated in Figure 21b. It can be seen that by interpolation between an energy value of the quantized previous envelope 135 to the corresponding energy value of the current quantized envelope 134, the energy values of the interpolated envelopes 136 can be determined for the blocks 131 of the shifted set 332 of blocks. In particular, for each block 131 of offset set 332, an interpolated envelope 136 can be determined, thus providing a plurality of interpolated envelopes 136 for the plurality of blocks 203, 201, 204, 205 of offset set 332 of blocks. The interpolated envelope 136 of a transform coefficient block 131 (e.g., any of the blocks 203, 201, 204, 205 of the shifted set of blocks 332) can be used to convert the transform coefficient block 131. It should be noted that the quantization indices 161 of the current envelope 133 are provided to a corresponding decoder within the bit stream. Accordingly, the corresponding decoder can be configured to determine the plurality of interpolated envelopes 136 in an analog manner to the interpolation unit 104 of the encoder 100.
[00120] Framing unit 101, envelope estimating unit 103, envelope quantizing unit 103, and interpolation unit 104 operate on a set of blocks (i.e. the current configuration 132 of blocks and/or the offset set of blocks 332). On the other hand, the actual transform coefficient conversion can be performed on a block-by-block basis. Next, reference is made to converting a current block 131 of transform coefficients which may be any one of the plurality of blocks 131 of the offset set 332 of blocks (or possibly the current configuration 132 of blocks in other implementations of the transform-based speech coder 100).
[00121] The current interpolated envelope 136 for the current block 131 may provide an approximation of the spectral envelope of the transform coefficients of the current block 131. The encoder 100 may comprise a preplanning unit 105 and an envelope gain determination unit. 106 which are configured to determine an adjusted envelope 139 for the current block 131, based on the current interpolated envelope 136 and based on the current block 131. In particular, an envelope gain for the current block 131 can be determined so that a variation of the flat transform coefficients of the current block 131 is set. (..•/;, k = _ :< can be the transform coefficients of the current block 131 (with, for example, :< = 25s), and £■(.<;, k = 2 K can be the values spectral energy values 303 of current interpolated envelope 136 (with the energy values £■(.<; of the same frequency band 302 equal). The envelope gain <; can be determined, so that the variation of the coefficients of flattened transform i.<: = 7^=== is adjusted In particular, the envelope gain ;; can be determined so that the variance is one.
[00122] It should be noted that the envelope gain can be determined for a sub-range of the full frequency range of the current block 131 of transform coefficients. In other words, the envelope gain s can be determined only based on a subset of the frequency slots 301 and/or only based on a subset of the frequency bands 302. By way of example, the envelope gain can be determined based on frequency buckets 301 greater than a starting frequency bucket 304 (wherein the starting frequency bucket is greater than 0 or 1). As a consequence, the adjusted envelope 139 for the current block 131 can be determined by applying the envelope gain only to the main spectral energy values 303 of the current interpolated envelope 136 that are associated with the frequency compartments 301 situated above the frequency compartment. 304. Therefore, the adjusted envelope 139 for the current block 131 may correspond to the current interpolated envelope 136, for frequency slots 301 in and below the initial frequency slot, and may correspond to the current interpolated envelope 136 shifted by envelope gain <; , for frequency compartments 301 above the initial frequency compartment. This is illustrated in Figure 21a by the fitted envelope 339 (shown in dashed lines).
[00123] Envelope gain application ;; 137 (which is also referred to as a level correction gain) for the current interpolated envelope 136 corresponds to an adjustment or an offset from the current interpolated envelope 136, thus yielding an adjusted envelope 139, as illustrated by Figure 21a. Envelope gain s 137 can be encoded as gain 162 data in the bit stream.
[00124] The encoder 100 may further comprise an envelope refinement unit 107 which is configured to determine the adjusted envelope 139 based on the envelope gain ;; 137 and based on the current interpolated envelope 136. Adjusted envelope 139 can be used for signal processing of transform coefficient block 131. The envelope gain ;; 137 can be quantized to a higher resolution (eg in 1dB steps) compared to the actual interpolated envelope 136 (which can be quantized in 3dB steps). As such, the adjusted envelope 139 can be quantized to the highest resolution of the envelope gain <; 137 (eg at 1dB steps).
[00125] Furthermore, the envelope refinement unit 107 may be configured to determine an allocation envelope 138. The allocation envelope 138 may correspond to a quantized version of the adjusted envelope 139 (eg, quantized to 3dB quantization levels). Allocation envelope 138 may be used for bit allocation purposes. In particular, allocation envelope 138 may be used to determine, for a particular transform coefficient of current block 131, for a particular quantizer from a predetermined set of quantizers, wherein the particular quantizer is to be used to quantize the transform coefficient. particular.
[00126] The encoder 100 comprises a flattening unit 108 configured to flatten the current block 131 using the adjusted envelope 139, thus rendering the block 140 of flattened transform coefficients ;. Flat transform coefficients block 140; can be encoded using a prediction loop within the transform domain. As such, block 140 may be coded using a subband estimator 117. The prediction loop comprises a unit of difference 115 configured to determine a block 141 of error prediction coefficients ΔC<;, based on the block 140 of flattened transform coefficients vi.<: and based on a 150 block of estimated transform coefficients -.'(.<:, e.g. ΔC< = . (.<: -.¥(.<:. It should be noted that due to the fact that block 140 comprises flattened transform coefficients, i.e., transform coefficients that have been normalized or flattened using energy values 303 of adjusted envelope 139, block 150 of estimated transform coefficients also comprises estimates of flattened transform coefficients. In other words, the unit of difference 115 operates in the so-called flattened domain. Consequently, the block 141 of error prediction coefficients ΔC<; is represented in the flattened domain.
[00127] Block 141 of error prediction coefficients Δ(.<: may exhibit a variance that differs from one. Encoder 100 may comprise a scaling unit 111 configured to scale the error prediction coefficients Δi<: to render a block 142 of resized error coefficients. The scaling unit 111 may make use of one or more heuristic predetermined rules to perform the scaling. As a result, the block 142 of resized error coefficients exhibits a variance that is (on average) closer to one (compared to error prediction coefficient block 141) This can be beneficial for quantization and subsequent conversion.
[00128] Encoder 100 comprises a coefficient quantization unit 112 configured to quantize block 141 of error prediction coefficients or block 142 of scaled error coefficients. The coefficient quantization unit 112 may comprise or may make use of a set of predetermined quantizers. The set of predetermined quantizers can provide quantizers with different degrees of precision or resolution. This is illustrated in Figure 22, where the different quantizers 321, 322, 323 are illustrated. Different quantizers can provide different levels of accuracy (indicated by different dB values). A particular quantizer among the plurality of quantizers 321, 322, 323 may correspond to a particular value of allocation envelope 138. As such, an energy value of allocation envelope 138 may point to a corresponding quantizer among the plurality of quantizers. As such, determining an allocation envelope 138 can simplify the process of selecting a quantizer to use for a particular error coefficient. In other words, allocation envelope 138 can simplify the bit allocation process.
[00129] The set of quantizers may comprise one or more quantizers 322 that make use of dithering to randomize the quantization error. This is illustrated in Figure 22 which shows a first predetermined set 326 of quantizers comprising a subset 324 of dotted quantizers and a second predetermined set 327 of quantizers comprising a subset 325 of dotted quantizers. As such, the coefficient quantization unit 112 may make use of different sets 326, 327 of predetermined quantizers, wherein the set of predetermined quantizers that should be used by the coefficient quantization unit 112 may depend on a control parameter. 146 provided by the estimator 117 and/or determined based on other side information available in the encoder and corresponding decoder. In particular, the coefficient quantization unit 112 can be configured to select a set 326, 327 of predetermined quantizers to quantize the scaled error coefficient block 142, based on the control parameter 146, where the control parameter 146 may depend on one or more estimator parameters provided by the estimator 117. The one or more estimator parameters may be indicative of the quality of the block 150 of estimated transform coefficients provided by the estimator 117.
[00130] Quantized error coefficients can be entropy encoded using, for example, a Huffman code, thus yielding coefficient data 163 to be included in the bit stream generated by encoder 100.
[00131] The following additional details regarding the selection or determination of a 326 set of quantizers 321, 322, 323 will be described. A set 326 of quantizers can correspond to an ordered collection 326 of quantizers. The ordered collection 326 of quantizers may comprise N quantizers, where each quantizer may correspond to a different distortion level. As such, the 326 collection of quantizers can provide N possible levels of distortion. The quantizers in the 326 collection can be ordered according to decreasing distortion (or equivalently, according to increasing SNR). Furthermore, quantizers can be labeled by integer labels. By way of example, quantizers can be labeled 0, 1, 2, etc., where an increasing integer label can indicate an increasing SNR.
[00132] The collection 326 of quantizers can be such that an SNR gap between two consecutive quantizers is at least approximately constant. For example, the SNR of the quantizer with a label of "1" might be 1.5 dB and the SNR of the quantizer with a label of "2" might be 3.0 dB. Therefore, the quantizers of the ordered collection 326 of quantizers can be such that by changing from a first quantizer to an adjacent second quantizer, the SNR (signal-to-noise ratio) is increased by a substantially constant amount (e.g., 1.5 dB), for all pairs of first and second quantizers.
[00133] The collection 326 of quantizers may comprise: • a noise-filling quantizer 321 that can provide an SNR that is slightly less than or equal to 0 dB which, for the rate allocation process, can be approximated to 0 dB; • Ndith quantizers 322 that can use subtractive dithering and that typically correspond to intermediate SNR levels (eg, Ndith > 0); and • Ncq classical quantizers 323 that do not use subtractive dithering and that typically correspond to relatively high levels of SNR (eg, Ncq > 0). The 323 undotted quantizers can correspond to scaling quantizers.
[00134] The total number N of quantizers is given by N = 1 + Ndth + Ncq .
[00135] An example of a quantizer collection 326 is shown in Figure 24a. The noise-filling quantizer 321 of the collection 326 of quantizers can be implemented, for example, using a random number generator that outputs a realization of a random variable, according to a predefined statistical model.
[00136] Additionally, the collection 326 of quantizers may comprise one or more dotted quantizers 322. The one or more dotted quantizers may be generated using an embodiment of a dither signal of pseudo-number 602, as shown in Figure 24th The pseudo-number dither signal 602 may correspond to a block 602 of pseudo-random dither values. Block 602 of dotted numbers can have the same dimensionality as block 142 of scaled error coefficients that must be quantized. The dither signal 602 (or the block 602 of dither values) can be generated using a dither generator 601. In particular, the dither signal 602 can be generated using a display table containing uniformly distributed random samples.
[00137] As will be shown in the context of Figure 24b, the individual dither values 632 of the block 602 of dither values are used to apply a dither to a corresponding coefficient that must be quantized (e.g. to an error coefficient corresponding resized block 142 of resized error coefficients). Block 142 of scaled error coefficients may comprise a total of :< scaled error coefficients. In a similar manner, the dithering value block 602 may comprise :< dithering values 632. The kth dithering value 632, with k = _ :<, from the dithering value block 602 may be applied to the rescaled Nth error coefficient of block 142 of resized error coefficients.
[00138] As indicated above, the block 602 of dithering values can have the same dimension as the block 142 of resized error coefficients that must be quantized. This is beneficial in that it allows the use of a single block 602 of dither values for all dithered quantizers 322 of a collection 326 of quantizers. In other words, in order to quantize and convert a given block 142 of resized error coefficients, pseudorandom dithering 602 can be generated only once for all admissible collections 326, 327 of quantizers and for all possible allocations to the distortion. This facilitates achieving synchronization between the encoder 100 and the corresponding decoder, as the use of the single dither signal 602 need not be explicitly assigned to the corresponding decoder. In particular, the encoder 100 and the corresponding decoder can make use of the same dither generator 601 which is configured to generate the same block 602 of dither values for the block 142 of scaled error coefficients.
[00139] The composition of the 326 collection of quantizers is preferably based on physical-acoustic considerations. Low rate transform encoding can lead to spectral artifacts that include spectral holes and band limiting that are triggered by the nature of the inverse water filling process that occurs in conventional quantization schemes that are applied to transform coefficients. The audibility of the spectral holes can be reduced by injecting noise into those frequency bands 302 which are below the water level for a short period of time and which have therefore been allocated a zero bit rate.
[00140] In general, it is possible to achieve an arbitrarily low bitrate rate with a dotted quantizer 322. For example, in the scalar case, one may choose to use a very large quantization step size. However, zero bit rate operation is not feasible in practice as it would impose demand requirements on the numerical precision needed to allow the quantizer to operate with a variable length encoder. This provides the motivation to apply a generic noise fill quantizer 321 at the distortion level of SNR 0dB, rather than applying a dotted quantizer 322. The proposed collection 326 of quantizers is designed so that the dotted quantizers 322 are used for levels of distortion that are associated with relatively small step sizes, so that variable-length encoding can be implemented without having to troubleshoot problems related to maintaining numerical accuracy.
[00141] For the case of scalar quantization, 322 quantizers with subtractive dither can be deployed using post-gains that provide near-optimal MSE performance. An example of a dotted scalar quantizer subtractive mode 322 is shown in Figure 24b. The dotted quantizer 322 comprises a uniform scalar quantizer Q 612 that is used within a subtractive dither structure. The subtractive dithering structure comprises a dithering subtraction unit 611 that is configured to subtract a dithering value 632 (from the dithering values block 602) from a corresponding error coefficient (from the scaling error coefficients block 142). born). Furthermore, the subtractive dither structure comprises a corresponding addition unit 613 which is configured to add the dither value 632 (from the 602 block of dither values) to the corresponding scalar quantized error coefficient. In the illustrated example, the dither subtraction unit 611 is placed upstream of the scalar quantizer Q 612 and the dither addition unit 613 is placed downstream of the scalar quantizer Q 612. The dither values 632 of the dither value block 602 can take values in the range [-0.5;0.5) or [0.1) to determine the step size time of the scalar quantizer 612. It should be noted that in an alternative implementation of the dotted quantizer 322, the dither subtraction unit 611 and dither addition unit 613 may be interchanged.
[00142] The subtractive dither structure can be followed by a scaling unit 614 which is configured to scale the quantized error coefficients by a quantizer post-gain ■. Subsequent to scaling the quantized error coefficients, block 145 of quantized error coefficients is obtained. It should be noted that the input X to the dotted quantizer 322 typically corresponds to the coefficients of the block 142 of scaled error coefficients that fall within the particular frequency band that is to be quantized using the dotted quantizer 322. In a similar manner , the output of the dotted quantizer 322 typically corresponds to the quantized coefficients of the block 145 of quantized error coefficients that fall within the particular frequency band.
[00143] It can be assumed that the input X to the dotted quantizer 322 is zero and that the variation
of input X is known. (For example, the signal variance can be determined from the signal envelope.) Furthermore, it can be assumed that a Z pseudorandom dither block 602 comprising dither values 632 is available to encoder 100 and the corresponding decoder. Furthermore, it can be assumed that the dither values 632 are independent of the input X . Several different dithers 602 can be used, but it is assumed below that the Z dither 602 is uniformly distributed between 0 and Δ, which can be represented by U(0,Δ). Schuchman conditions can be used (eg, a dither 602 that is evenly distributed between [-0.5;0.5) to determine the step size Δ of the scalar quantizer 612).
[00144] The quantizer Q 612 can be a lattice and the extent of its Voronoi cell can be Δ. In this case, the dithering signal would have a uniform distribution over the Voronoi cell length of the network that is used.
[00145] The quantizer post-gain Y can be produced given the signal variation and the quantization step size, since the dither quantizer is analytically traceable to any step size (ie bitrate). In particular, post-gain can be produced to enhance the MSE performance of a quantizer with a subtractive dither. Post-earnings can be provided by:

[00146] Despite the application of the post-gain Y , the MSE performance of the 322 dotted quantizer can be improved, a 322 dotted quantizer typically has a lower MSE performance than a quantizer with no dither (although this performance fades as the bitrate increases). Consequently, in general, dithered quantizers are noisier than their non-dithered versions. Therefore, it may be desirable to use dotted quantizers 322 only when the use of dotted quantizers 322 is justified by the beneficial percentage noise filling property of dotted quantizers 322.
[00147] Therefore, a collection 326 of quantizers comprising three types of quantizers can be provided. The ordered quantizer collection 326 may comprise a single noise-fill quantizer 321, one or more subtractive dithered quantizers 322, and one or more classical (non-dithered) quantizers 323. Consecutive quantizers 321, 322, 323 may provide additional enhancements to the SNR. The further enhancements between a pair of adjacent quantizers from the ordered collection 326 of quantizers may be substantially constant for some or all of the pairs of adjacent quantizers.
[00148] A private collection 326 of quantizers can be defined by the number of dotted quantizers 322 and the number of undotted quantizers 323 comprised within the private collection 326. Furthermore, the private collection 326 of quantizers can be defined by a particular realization of the signal dithering 602. Collection 326 can be designed to provide percent efficient quantization of transform coefficient rendering: zero fill rate noise (yielding SNR slightly lower than or equal to 0dB); noise filling by subtractive dithering at intermediate distortion level (intermediate SNR); and lack of noise fill at low distortion levels (high SNR). Collection 326 provides a set of allowable quantizers that can be selected during a rate allocation process. An application of a particular quantizer from the collection 326 of quantizers to the coefficients of a particular frequency band 302 is determined during the rate allocation process. It is typically not known a priori which quantizer will be used to quantize the coefficients of a particular frequency band 302. However, it is typically known a priori what the composition of the collection 326 of quantizers is.
[00149] The aspect of using different types of quantizers for different frequency bands 302 of a block 142 of error coefficients is illustrated in Fig. 24c , in which an exemplary result of the rate allocation process is shown. In this example, it was assumed that the fee allocation follows the so-called inverse water fill principle. Figure 24c illustrates the spectrum 625 of an input signal (or the envelope of the coefficient blocks to be quantized). It can be seen that the frequency band 623 has relatively high spectral energy and is quantized using a classical quantizer 323 that provides relatively low levels of distortion. Frequency bands 622 exhibit spectral energy above the water level 624. The coefficients in these frequency bands 622 can be quantized using dotted quantizers 322 that provide intermediate levels of distortion. Frequency bands 621 exhibit spectral energy below the water level 624. The coefficients in these frequency bands 621 can be quantized using zero rate noise padding. The different quantizers used to quantize the particular coefficient blocks (represented by the spectrum 625) may be part of a particular collection 326 of quantizers that has been determined for the particular coefficient blocks.
[00150] Therefore, the three different types of quantizers 321, 322, 323 can be applied selectable (eg selectable with respect to frequency). The decision to apply a particular type of quantizer can be determined in the context of a rate allocation procedure which is described below. The rate allocation procedure can make use of a perception criterion that can be produced from the RMS envelope of the input signal (or, for example, from the power spectral density of the signal). The type of quantizer to be applied to a particular frequency band 302 need not be explicitly assigned to the corresponding decoder. The need to signal the selected type of quantizer is eliminated, as the corresponding decoder can determine the particular set 326 of quantizers that was used to quantize a block of the delineated perception criterion input signal (e.g., the envelope of allocation 138), from the predetermined composition of the collection of quantizers (eg, a predetermined set of different collections of quantizers) and from a single global rate allocation parameter (also referred to as an offset parameter).
[00151] The determination in the decoder of the collection 326 of quantizers that was used by the encoder 100 is facilitated by modeling the collection 326 of quantizers, so that the quantizers are ordered according to their distortion (eg, SNR). Each quantizer in the 326 collection can decrease the distortion (can refine the SNR) of the previous quantizer by a constant amount. Furthermore, a particular collection 326 of quantizers can be associated with a single realization of a pseudorandom dither signal 602, during the entire rate allocation process. As a result of this, the result of the rate allocation procedure does not affect the realization of the dither signal 602. This is beneficial to ensure a convergence of the rate allocation procedure. Furthermore, this allows the decoder to perform decoding if the decoder knows the only realization of the 602 dither signal. You can make the decoder aware of the realization of the 602 dither signal using the same pseudo-dithering generator. 601 in the encoder 100 and in the corresponding decoder.
[00152] As noted above, encoder 100 can be configured to perform a bit allocation process. To this end, encoder 100 may comprise bit allocation units 109, 110. Bit allocation unit 109 may be configured to determine the total number of bits 143 that are available to encode the current block 142 of resized error coefficients. . The total number of bits 143 can be determined based on the allocation envelope 138. The bit allocation unit 110 can be configured to provide a relative allocation of bits for the different scaled error coefficients depending on the corresponding energy value. in allocation envelope 138.
[00153] The bit allocation process may make use of an interactive allocation procedure. In the course of the allocation procedure, the allocation envelope 138 can be shifted using an shift parameter, by selecting quantizers with increased/decreased resolution. As such, the shift parameter can be used to refine or thicken the overall quantization. The shift parameter can be determined so that the coefficient data 163 that was obtained using the quantizers provided by the shift parameter and allocation envelope 138 comprises several bits that correspond to (or do not exceed) the number of bits total 143 assigned to current block 131. The shift parameter that was used by encoder 100 to encode current block 131 is included as coefficient data 163 in the bit stream. As a consequence, the corresponding decoder is enabled to determine the quantizers that were used by the coefficient quantization unit 112 to quantize the block 142 of scaled error coefficients.
[00154] As such, the rate allocation process can be performed at the encoder 100, which helps in allocating the available bits 143 according to a perception model. The perception model may depend on allocation envelope 138 derived from block 131 of transform coefficients. The rate allocation algorithm distributes the available bits 143 among the different types of quantizers, i.e. the zero-rate noise padding 321, the one or more dotted quantizers 322, and the one or more classical undotted quantizers 323. The final decision on the type of quantizer to be used to quantize the coefficients of a particular frequency band 302 of the spectrum may depend on the perceptual signal model in performing pseudorandom dithering and bit rate restriction.
[00155] In the corresponding decoder, the bit allocation (indicated by allocation envelope 138 and shift parameter) can be used to determine quantization indices probabilities in order to facilitate lossless decoding. A method of computing quantization index probabilities can be used that employs the use of a full-band pseudorandom dithering realization 602, the perception model parameterized by the signal envelope 138 and the rate allocation parameter (i.e. , the offset parameter). With use of the allocation envelope 138, the shift parameter, and knowledge with respect to the 602 block of dither values, the composition of the 326 collection of quantizers in the decoder can be in sync with the 326 collection used in the encoder 100.
[00156] As outlined above, the bitrate constraint can be specified in terms of a maximum number of bits allowed per frame 143. This applies, for example, to quantization indices that are subsequently entropy encoded with use, for example example, from a Huffman code. In particular, this applies to coding scenarios where/where the bit stream is generated in a sequential manner, where a single parameter is quantized at a time, and where the corresponding quantization index is converted to a binary codeword. which is added to the bit stream.
[00157] If arithmetic encoding (or range encoding) is in use, the principle is different. In the context of arithmetic coding, typically a single codeword is assigned a long sequence of quantization indices. Typically, it is not possible to exactly associate a particular portion of the bit stream with a particular parameter. In particular, in the context of arithmetic encoding, the number of bits that are required to convert a random realization of a signal is typically unknown. This is the case even if the statistical model of the signal is known.
[00158] In order to solve the technical problem mentioned above, it is proposed to make the arithmetic encoder a part of the rate allocation algorithm. During the rate allocation process, the encoder tries to quantize and encode a set of coefficients from one or more frequency bands 302. For each trial, it is possible to observe the change in the state of the arithmetic encoder and compute the number of positions to advance in the bitstream (instead of computing multiple bits). If a maximum bitrate constraint is configured, this maximum bitrate constraint can be used in the rate allocation procedure. The cost of the terminating bits of the arithmetic code can be included in the cost of the last encoded parameter and, in general, the cost of the terminating bits will vary depending on the state of the arithmetic encoder. However, once the termination cost is available, it is possible to determine the number of bits needed to convert the quantization indices corresponding to the coefficient set of one or more frequency bands 302.
[00159] It should be noted that in the context of arithmetic conversion, a single dithering performance 602 can be used for the entire rate allocation process (of a particular block 142 of coefficients). As outlined above, the arithmetic encoder can be used to estimate the bitrate cost of a particular quantizer selection within the rate allocation procedure. The state change of the arithmetic encoder can be observed and the state change can be used to compute the various bits needed to perform the quantization. Furthermore, the arithmetic code termination process can be used within the rate allocation process.
[00160] As indicated above, quantization indices can be coded using an arithmetic code or an entropy code. If the quantization indices are entropy encoded, the probability distribution of the quantization indices can be taken into account in order to assign variable-length codewords to individual or group quantization indices. The use of dithering can have an impact on the probability distribution of quantization indices. In particular, the particular realization of a dither signal 602 can have an impact on the probability distribution of the quantization indices. Due to the virtually unlimited number of realizations of the dither signal 602, in the general case the codeword probabilities are not known a priori and it is not possible to use Huffman coding.
[00161] It has been observed by the inventors that it is possible to reduce the number of possible dithering realizations to a relatively small and manageable set of dithering signal realizations 602. By way of example, for each frequency band 302, a limited set of dither values can be provided. To this end, the encoder 100 (as well as the corresponding decoder) may comprise a discrete dither generator 801 configured to generate the dither signal 602 by selecting one of M predetermined dithering embodiments (see Figure 26). By way of example, M different predetermined dithering realizations can be used for each frequency band 302. The number M of predetermined dithering realizations can be M<5 (e.g., M=4 or M=3)
[00162] Due to the limited number M of dithering realizations, it is possible to train a (possibly multidimensional) Huffman codebook for each dithering realization, yielding a collection 803 of M codebooks. Encoder 100 may comprise a codebook selection unit 802 that is configured to select one of the collection 803 of M predetermined codebooks, based on the selected dithering performance. Doing this ensures that the entropy conversion is in sync with the dither generation. Selected codebook 811 can be used to convert individual or group quantization indices that have been quantized using the selected dithering performance. As a consequence, entropy conversion performance can be improved when dotted quantizers are used.
[00163] The predetermined codebook collection 803 and the discrete dither generator 801 can also be used in the corresponding decoder (as illustrated in Figure 26). Decoding is feasible if pseudorandom dithering is used and the decoder remains in sync with encoder 100. In this case, discrete dithering generator 801 in the decoder generates dithering signal 602 and the particular dithering performance is uniquely associated to a particular Huffman codebook 811 from the 803 codebook collection. Given that the physical-acoustic model (e.g. represented by the allocation envelope 138 and the rate allocation parameter) and the selected codebook 811, the decoder is able to perform decoding using the Huffman decoder 551 to render the indices decoded quantization devices 812.
[00164] As such, a relatively small set of 803 Huffman codebooks can be used instead of arithmetic encoding. The use of a particular codebook 811 of the Huffman codebook configuration 813 may depend on a predetermined realization of the dither signal 602. At the same time, a limited set of permissible dithering values that form M predetermined dithering realizations can to be used. Then, the rate allocation process may involve the use of unspeckled quantizers, dotted quantizers, and Huffman coding.
[00165] As a result of quantizing the scaled error coefficients, a block 145 of quantized error coefficients is obtained. The quantized error coefficient block 145 corresponds to the error coefficient blocks that are available in the corresponding decoder. Accordingly, block 145 of quantized error coefficients can be used to determine a block 150 of estimated transform coefficients. The encoder 100 may comprise an inverse scaling unit 113 configured to perform the inverse of the scaling operations performed by the scaling unit 113, thus yielding a block 147 of quantized scaled error coefficients. An addition unit 116 may be used to determine a block 148 of reconstructed flattened coefficients by adding the block 150 of estimated transform coefficients to the block 147 of scaled quantized error coefficients. Furthermore, an inverse flattening unit 114 can be used to apply the fitted envelope 139 to the block 148 of reconstructed flattened coefficients, thus yielding a block 149 of reconstructed coefficients. The reconstructed coefficient block 149 corresponds to the version of the transform coefficient block 131 that is available in the corresponding decoder. Accordingly, block 149 of reconstructed coefficients can be used in estimator 117 to determine block 150 of estimated coefficients.
[00166] Block 149 of reconstructed coefficients is represented in the unplanned domain, i.e. block 149 of reconstructed coefficients is also representative of the spectral envelope of the current block 131. As outlined below, this can be beneficial to the performance of the estimator 117.
[00167] Estimator 117 may be configured to estimate block 150 of estimated transform coefficients based on one or more previous blocks 149 of reconstructed coefficients. In particular, the estimator 117 can be configured to determine one or more estimator parameters so that a predetermined prediction error criterion is reduced (e.g., minimized). By way of example, the one or more estimator parameters may be determined such that an energy or a percentage weighted energy of the error prediction coefficient block 141 is reduced (e.g. minimized). The one or more estimator parameters may be included as estimator data 164 in the bit stream generated by encoder 100.
[00168] The estimator 117 can make use of a signal model as described in Patent Application US61750052 and the patent applications claiming priority thereto, the contents of which are incorporated by way of reference. The one or more estimator parameters can correspond to one or more model parameters of the signal model.
[00169] Figure 19b shows a block diagram of an additional exemplary transform-based speech coder 170. The transform-based speech coder 170 of Figure 19b comprises several of the components of the coder 100 of Figure 19a. However, the transform-based speech encoder 170 of Figure 19b is configured to generate a bit stream that has a variable bit rate. To this end, encoder 170 comprises an Average Bit Rate (ABR) state unit 172 configured to continue tracking the bit rate that was used by the bit stream to precede blocks 131. Bit allocation unit 171 uses this information to determine the total number of bits 143 that are available to encode the current block 131 of transform coefficients.
[00170] In the following, a transform-based speech decoder 500 is described in the context of Figures 23a to 23d. Figure 23a shows a block diagram of an exemplary transform-based speech decoder 500. The block diagram shows a synthesis filter bank 504 (also referred to as an inverse transform unit) that is used to convert a block 149 of coefficients reconstructed from the time domain transform domain, thus rendering samples of the decoded audio signal. Synthesis filterbank 504 can make use of an inverse MDCT with a predetermined step (e.g., a step of approximately 5 ms or 256 samples).
[00171] Decoder 500 main loop operates in units of this step. Each step produces a transform domain vector (also referred to as a block) that has a length or dimension that corresponds to a predetermined bandwidth configuration of the system. After zero padding to the transform size of the synthesis filterbank 504, the transform domain vector will be used to synthesize a time domain signal update of a predetermined length (e.g., 5 ms) for the overlay process. /addition of synthesis filter bank 504.
[00172] As indicated above, generic audio codecs typically transform based employ frames with short block sequences in the 5 ms range for transient handling. As such, generic transform-based audio codecs provide the necessary transforms and window switching tools for seamless coexistence of short and long blocks. A speech spectral front-end defined by omitting the synthesis filter bank 504 of Figure 23a can therefore be conveniently integrated into the general-purpose transform-based audio codec without the need to introduce additional mutation tools. . In other words, the transform-based speech decoder 500 of Figure 23a can be conveniently combined with a generic transform-based audio decoder. In particular, the transform-based speech decoder 500 of Figure 23a can make use of the synthesis filter bank 504 provided by the generic transform-based audio decoder (e.g., the AAC or HE-AAC decoder).
[00173] From the input bit stream (in particular, from the envelope data 161 and gain data 162 comprised within the bit stream), a signal envelope can be determined by an envelope decoder 503. In particular, envelope decoder 503 can be configured to determine adjusted envelope 139 based on envelope data 161 and gain data 162). As such, the decoder envelope 503 can perform similar tasks to the interpolation unit 104 and the envelope refinement unit 107 of the encoder 100, 170. As outlined above, the adjusted envelope 109 represents a model of the signal variation over a set of bands. preset frequency 302.
[00174] Furthermore, the decoder 500 comprises an inverse flattening unit 114 which is configured to apply the adjusted envelope 139 to a flattened domain vector, whose inputs may be of a nominal variation of one. The flattened domain vector corresponds to the block 148 of reconstructed flattened coefficients described in the context of the encoder 100, 170. At the output of the inverse flattening unit 114, the block 149 of reconstructed coefficients is obtained. Block 149 of reconstructed coefficients is provided to synthesis filterbank 504 (to generate the decoded audio signal) and subband estimator 517.
[00175] The subband estimator 517 operates in a similar manner to the estimator 117 of the encoder 100, 170. In particular, the subband estimator 517 is configured to determine a block 150 of estimated transform coefficients (in the flattened domain). ) based on one or more previous blocks 149 of reconstructed coefficients (using one or more assigned estimator parameters within the bitstream). In other words, the subband estimator 517 is configured to output a predicted flattened domain vector from a buffer of previously decoded output vectors and signal envelopes, based on estimator parameters such as an estimator latency and a gain of estimator. The decoder 500 comprises an estimator decoder 501 configured to decode the estimator data 164 in order to determine the one or more estimator parameters.
[00176] Decoder 500 additionally comprises a spectrum decoder 502 that is configured to provide additive correction to the predicted flattened domain vector, typically based on most of the bit stream (i.e., based on in the coefficient data 163). The spectrum decoding process is primarily controlled by an allocation vector that is produced from the envelope and a transmitted allocation control parameter (also referred to as the shift parameter). As illustrated in Figure 23a, there can be a direct dependence of the spectrum decoder 502 on the estimator parameters 520. As such, the spectrum decoder 502 can be configured to determine block 147 of scaled quantized error coefficients based on in the received coefficient data 163. As outlined in the context of encoder 100, 170, quantizers 321, 322, 323 used to quantize block 142 of resized error coefficients typically depend on allocation envelope 138 (which can be produced at from the adjusted envelope 139) and the offset parameter. Furthermore, the quantizers 321, 322, 323 may depend on a control parameter 146 provided by the estimator 117. The control parameter 146 can be output from the decoder 500 using the estimator parameters 520 (in an analogous manner to encoder 100, 170).
[00177] As indicated above, the received bit stream comprises envelope data 161 and gain data 162 that can be used to determine the adjusted envelope 139. In particular, the unit 531 of the envelope decoder 503 can be configured to determine the envelope quantized current envelope 134 from envelope data 161. By way of example, quantized current envelope 134 may have a resolution of 3 dB in predefined frequency bands 302 (as indicated in Figure 21a). The quantized current envelope 134 may be updated for each set 132, 332 of blocks (e.g., every four coding units, i.e., blocks, or every 20 ms), in particular, for each shifted set 332 of blocks. Frequency bands 302 of quantized current envelope 134 may comprise an increased number of frequency slots 301 as a function of frequency in order to adapt to human hearing properties.
[00178] The currently quantized envelope 134 can be linearly interpolated from a pre-quantized envelope 135 to the integer of the interpolated envelopes 136 for each block 131 of the altered set 332 of blocks (or possibly, the current configuration 132 of blocks) . The interpolated envelopes 136 can be determined in the quantized 3 dB range. This means that the 303 interpolated power values can be rounded to the nearest 3 dB level. An example interpolated envelope 136 is illustrated by the dotted plot of Figure 21a. For each currently quantized envelope 134, gain-, four-level correction 137 (also referred to as envelope gains) are provided as gain data 162. Gain decoding unit 532 can be configured to determine the gain-, level correction gain 137 of gain data 162. Level correction gains can be quantized in 1 dB steps. Each level correction gain is applied to the corresponding interpolated envelope 136 in order to provide the adjusted envelope 139 for the different blocks 131. Due to the increased resolution of the level correction gains 137, the adjusted envelope 139 may have an increased resolution (for example, a resolution of 1 dB).
[00179] Figure 21b shows an example of linear or geometric interpolation between the previously quantized envelope 135 and the currently quantized envelope 134. The envelopes 135, 134 can be separated into a half-level part and a logarithmic spectrum part format. These parts can be interpolated with independent strategies such as a linear, a geometric or a harmonic strategy (parallel resistors). As such, different interpolation schemes can be used to determine the interpolated envelopes 136. The interpolation scheme used by decoder 500 typically corresponds to an interpolation scheme used by encoder 100, 170.
[00180] The envelope refinement unit 107 of the decoder envelope 503 can be configured to determine an allocation envelope 138 of the adjusted envelope 139 by quantizing the adjusted envelope 139 (eg, for 3 dB steps). The allocation envelope 138 can be used in conjunction with the allocation control parameter or shift parameter (comprised within the coefficient data 163) to create a nominal integer allocation vector used to control spectral decoding. , that is, decoding the coefficient data 163. In particular, the nominal integer allocation vector can be used to determine a quantizer for opposite quantization of the compressed quantization indices within the coefficient data 163. The allocation envelope 138 and the nominal integer allocation vector can be determined in a manner analogous to encoder 100, 170 and decoder 500.
[00181] Figure 27 illustrates an example bit allocation process based on allocation envelope 138. As highlighted above, allocation envelope 138 can be quantized according to a predetermined resolution (e.g., a resolution of 3 dB) . Each quantized spectral energy value of allocation envelope 138 can be assigned a corresponding integer value, where adjacent integer values can represent a difference in spectral energy corresponding to the predetermined resolution (e.g., 3dB difference). The resulting set of integers can be called an integer allocation envelope 1004 (called an iEnv). The integer allocation envelope 1004 can be shifted via the shift parameter to yield the nominal integer allocation vector (called the iAlloc) which provides a direct indication of the quantizer to be used to quantize the coefficient of a frequency band. particular 302 (identified by a band frequency index, bandIdx).
[00182] Figure 27 diagrammatically shows 1003 the integer allocation envelope 1004 as a function of frequency bands 302. It can be seen that for frequency band 1002 (bandIdx = 7) the integer allocation envelope 1004 takes the integer value -17 (iEnv[7]=-17). The integer allocation envelope 1004 can be limited to a maximum value (called iMax, for example iMax = -15). The bit allocation process can make use of a bit allocation formula that provides a quantizer index 1006 (called iAlloc[bandIdx]) as a function of the integer allocation envelope 1004 and the offset parameter ( called AllocOffset). As noted above, the offset parameter (ie, AllocOffset) is passed to the corresponding decoder 500, thereby allowing the decoder 500 to determine the indices of quantizer 1006 using a bit allocation formula. The bit allocation formula can be given as iAlloc[bandIdx] = iEnv[bandIdx] - (iMax - CONSTANT_OFFSET ) + AllocOffset, where CONSTANT_OFFSET can be a constant offset, for example, CONSTANT_OFFSET=20. By way of example, if the bit allocation process has determined that the bit rate constraint can be achieved using an offset parameter AllocOffset=-13, the quantizer index 1007 of the 7th frequency band can be obtained as iAlloc[7] = -17 - (-15-20) - 13 = 5. Using the bit allocation formula mentioned above for all frequency bands 302, the quantizer indices 1006 (and by consequently the quantizers 321, 322, 323) for all frequency bands 302 can be determined. A quantizer index less than zero can be rounded up to a quantizer index of zero. In a similar manner, a quantizer index greater than the maximum available quantizer index can be rounded down to the maximum available quantizer index.
[00183] Furthermore, Figure 27 shows an exemplary noise envelope 1011 that can be achieved using a quantization scheme described in this document. Noise envelope 1011 shows the quantization noise envelope that is introduced during quantization. If plotted along with the signal envelope (represented via the integer allocation envelope 1004 in Figure 27), the noise envelope 1011 illustrates the fact that the distribution of quantization noise is optimized in percentage with respect to the envelope. sign.
[00184] In order to allow a decoder 500 to synchronize with a received bit stream, different types of frames may be transmitted. A frame may correspond to a set 132, 332 of blocks, in particular an altered block 332 of blocks. In particular, so-called P-frames can be transmitted, which are encoded in a relative manner with respect to a previous frame. In the above description, the decoder 500 was assumed to be aware of the prequantized envelope 135. The prequantized envelope 135 may be provided in a previous frame, so that the current configuration 132 or the corresponding altered set 332 may correspond to a P-frame. However, in an outbound scenario, the decoder 500 is typically not aware of the prequantized envelope 135. For this purpose, an I-frame may be transmitted (eg, upon departure or on a regular basis). The I-frame can comprise two envelopes, one which is used as the pre-quantized envelope 135 and the other which is used as the currently quantized envelope 134. The I-frames can be used for the case of speech spectral frontend departure ( that is, from the transform-based speech decoder 500), for example, when following a frame that employs a different audio encoding mode and/or as a tool to explicitly allow a split point of the audio bit stream.
[00185] The operation of the subband estimator 517 is illustrated in Figure 23d. In the illustrated example, the estimator parameters 520 are a latency parameter and a /.-estimator gain parameter. The estimator parameters 520 can be determined from the estimator data 164 using a predetermined table of possible values for the latency parameter and the estimator gain parameter. This allows efficient bitrate transmission of the 520 estimator parameters.
[00186] One or more previously decoded transform coefficient vectors (i.e., one or more previous blocks 149 of the reconstructed coefficients) may be stored in a subband signal buffer (or MDCT) 541 The 541 buffer can be updated per step (eg every 5 ms). The estimator extractor 543 can be configured to operate on buffering 541 depending on a normalized latency parameter r. The normalized latency parameter -7 can be determined by normalizing the latency parameter 520 to the step units (eg to MDCT step units). If the latency-r parameter is an integer, extractor 543 can obtain one or more T of time units from previously decoded transform coefficient vectors in buffer 541. In other words, the latency-F parameter can be indicative of which of the one or more previous blocks 149 of reconstructed coefficients should be used to determine block 150 of estimated transform coefficients. A detailed discussion regarding a possible implantation of extractor 543 is provided in patent application US61750052 and the patent applications claiming priority thereof, the contents of which are incorporated by reference.
[00187] Extractor 543 can operate on vectors (or blocks) that transmit complete signal envelopes. On the other hand, the block 150 of the estimated transform coefficients (to be provided via the subband estimator 517) is represented in the flattened domain. Consequently, the output of extractor 543 can be formed into a flattened domain vector. This can be achieved using a formatter 544 that makes use of the adjusted envelopes 139 of one or more previous blocks 149 of reconstructed coefficients. The adjusted envelopes 139 of one or more previous blocks 149 of reconstructed coefficients may be stored in an envelope buffer 542. The formatter unit 544 may be configured to obtain a delayed signal envelope to be used in planning from the processing units. time in envelope buffering 542, where is the closest integer to Then, the flattened domain vector can be scaled via the /.-gain parameter to yield block 150 of the estimated transform coefficients (in the flattened domain) .
[00188] As an alternative, the delayed planning process performed through the 544 formatter can be omitted by using a subband estimator 517 that operates in the planned domain, for example, a subband estimator 517 that operates on the reconstructed flat coefficient blocks 148. However, it was found that a sequence of flattened domain vectors (or blocks) do not map well to time signals due to time aliased aspects of the transform (eg, the MDCT Transform). As a consequence, the fit to the detached signal model of the extractor 543 is reduced and a higher level of coding noise results from an alternative structure. In other words, it has been found that signal models (eg sinusoidal or periodic models) used via the 517 subband estimator yield increased performance in the unplanned domain (compared to the flattened domain).
[00189] It should be noted that in an alternative example, the output of the estimator 517 (i.e., the estimated transform coefficients block 150) may be added to the output of the opposite planning unit 114 (i.e., the block 149 of reconstructed coefficients) (see Figure 23a). The formatter unit 544 of Figure 23c can then be configured to perform the combined operation of backward flattening and opposite flattening.
[00190] Elements in the received bit stream can control the occasional release of subband buffering 541 and envelope buffering 541, for example in the case of a first encoding unit (i.e., a first block) of an I-frame. This allows the decoding of an I-frame without knowledge of previous data. The first encoding unit will typically not have the ability to make use of a prediction contribution, but can nevertheless use a relatively smaller number of bits to carry the estimator information 520. The prediction loss gained can be compensated by allocating more bits to the prediction error encoding of that first encoding unit. Typically, the estimator contribution is again substantial for the second encoding unit (i.e. a second block) of an I-frame. Due to these aspects, the quality can be maintained with a relatively small increase in the bit rate, even with a small frequent use of the I-frames.
[00191] In other words, the configurations 132, 332 of blocks (also called frames) comprise a plurality of blocks 131 that can be encoded using preview encoding. When converting an I-frame, only the first block 203 of a set 332 of blocks cannot be encoded using the encoding gain achieved by a predictive encoder. The directly following block 201 can make use of the benefits of prediction conversion. This means that the disadvantages of an I-frame with respect to encoding efficiency are limited to converting the first block 203 of transform coefficients of frame 332, and does not apply to the other blocks 201, 204, 205 of frame 332. Therefore, the transform-based speech coding scheme described in this document allows relatively frequent use of I-frames without significant impact on coding efficiency. As such, the presently described transform-based speech coding scheme, in particular, is suitable for applications that require relatively fast and/or relatively frequent synchronization between decoder and encoder.
[00192] Figure 23d shows a block diagram of an exemplary spectrum decoder 502. The spectrum decoder 502 comprises a lossless decoder 551 that is configured to decode the entropy encoded coefficient data 163. In addition, the decoder 502 comprises an opposing quantizer 552 that is configured to assign coefficient values to the quantization indices comprised within the coefficient data 163. As noted in the context of encoder 100, 170, different transform coefficients can be quantized. using different quantizers selected from a set of predetermined quantizers, for example, a finite set of model-based scaling quantizers. As shown in Figure 22, a set of quantizers 321, 322, 323 may comprise different types of quantizers. The set of quantizers may comprise a quantizer 321 providing noise synthesis (in case of zero bitrate), one or more dotted quantizers 322 (for relatively low signal to noise ratio, SNRs and for intermediate bitrate) and/or or one or more single quantizers 323 (for relatively high SNRs and for relatively high bit rate).
[00193] Envelope refinement unit 107 can be configured to provide allocation envelope 138 which can be combined with the offset parameter comprised within coefficient data 163 to yield an allocation vector. The allocation vector contains an integer value for each frequency band 302. The integer value for a particular frequency band 302 points to the rate distortion point to be used for opposite quantization of the band's transform coefficients. 302. In other words, the integer value for the particular frequency band 302 points to the quantizer to be used for the opposite quantization of the transform coefficients of the particular band 302. An increase of the integer value by one corresponds to a 1.5 dB increase in SNR. For the dotted quantizers 322 and the simple quantizers 323, a Laplacian probability distribution model can be used in lossless coding, which can employ arithmetic coding. One or more dotted quantizers 322 may be used to bridge the gap in an unbroken manner between low and high bit rate cases. The 322 dotted quantizers can be beneficial in creating sufficiently smooth audio output quality for stationary noise type signals.
[00194] In other words, the opposite quantizer 552 can be configured to receive the coefficient quantization indices of a current block 131 of transform coefficients. One or more coefficient quantization indices of a particular frequency band 302 were determined using a corresponding quantizer from a predetermined set of quantizers. The value of the allocation vector (which can be determined by compensating the allocation envelope 138 with the offset parameter) for the particular frequency band 302 indicates the quantizer that was used to determine one or more band coefficient quantization indices. frequency frequency 302. By identifying the quantizer, one or more coefficient quantization indices can be oppositely quantized to yield block 145 of quantized error coefficients.
[00195] In addition, the spectral decoder 502 may comprise an opposite scaling unit 113 to provide block 147 of quantized and dimensioned error coefficients. Additional tools and interconnects around the lossless decoder 551 and the counter quantizer 552 of Figure 23d can be used to adapt the spectral decoding for its use in the general decoder 500 shown in Figure 23a, where the output of the spectral decoder 502 (i.e. is block 145 of quantized error coefficients) is used to provide an additive correction to a forecast flattened domain vector (i.e. block 150 of estimated transform coefficients). In particular, additional tools can ensure that the process performed through the decoder 500 corresponds to the process performed through the encoder 100, 170.
[00196] In particular, spectral decoder 502 may comprise a heuristic scaling unit 111. As shown in conjunction with encoder 100, 170, heuristic scaling unit 111 can have an impact on bit allocation. In the encoder 100, 170, the actual blocks 141 of error prediction coefficients can be scaled up to the unity variance via a heuristic rule. As a consequence, the default allocation can lead to very fine quantization of the final descending output of the heuristic scaling unit 111. Therefore, the allocation must be modified in a similar way to modifying the error prediction coefficients.
[00197] However, as highlighted below, it may be beneficial to avoid reducing encoding resources to one or more of the low frequency compartments (or low frequency bands). In particular, this can be beneficial to combat an LF (low frequency) noise/noise artifact that occurs in most spoken situations (ie for signal that has a relatively large control parameter 146, rfu). As such, the bit allocation/quantizer selection in dependence on control parameter 146, which is described below, can be considered as a "voice adaptive LF quality amplifier".
[00198] The spectral decoder may depend on a control voice adaptive LF quality amplifier parameter 146 called rfu which is a limited version of gain-; of estimator, rfu = minfl, maxQj, 0)) .
[00199] Using control parameter 146, the set of quantizers used in coefficient quantization unit 112 of encoder 100, 170 and used in opposite quantizer 552 can be adapted. In particular, the noise of the quantizer array can be adapted based on control parameter 146. By way of example, a value of control parameter 146, rfu, close to 1 can trigger a limitation of the range of allocation levels with the use of dotted quantizers and can trigger a reduction in the variation of the noise level synthesis. In one example, a dithering decision threshold at rfu = ü 75 and a noise gain equal to _ - rfu can be set. Dithering adaptation can affect both lossless decoding and the opposite quantizer, where noise gain adaptation typically affects only the opposite quantizer.
[00200] The estimator contribution can be assumed to be substantial for spoken/tonal situations. As such, a gain- of relatively high estimator (ie, a relatively high control parameter 146) may be indicative of a spoken or tonal speech signal. In such situations, an addition of noise in relation to dithering or explicit (zero allocation case) was empirically shown to be counterproductive to the perceived quality of the encoded signal. As a consequence, the number of dotted quantizers 322 and/or the noise type used for the noise synthesis quantizer 321 can be adapted based on the estimator gain, thereby improving the perceived quality of the signal. coded speech.
[00201] As such, control parameter 146 can be used to modify the range 324, 325 of SNRs for which the dotted quantizers 322 are used. By way of example, if control parameter 146 rfu is < 0"5, range 324 for the dotted quantizers can be used. In other words, if control parameter 146 is below a predetermined threshold, the first set 326 of quantizers can be used. On the other hand, if the control parameter 146 rfu is > 3 75, the range 325 for the dotted quantizers can be used. In other words, if the control parameter 146 is greater than or equal to the threshold predetermined, the second set 327 of quantizers can be used.
[00202] In addition, control parameter 146 can be used for variation modification and bit allocation. The reason for this is that successful prediction will typically require a minor correction, especially in the lower frequency range from 0 to 1 kHz. It may be advantageous to make the quantizer explicitly aware of this deviation from the unit variation model in order to free up encoding resources for larger frequency bands 302.EQUIVALENTS, EXTENSIONS, ALTERNATIVES AND MISCELLANEOUS
[00203] Additional embodiments of the present invention will become apparent to a person skilled in the art after studying the above description. While the present description and drawings disclose embodiments and examples, the invention is not restricted to these specific examples. Various modifications and variations can be made without departing from the scope of the present invention, which is defined by the appended claims. Any reference in the claims shall not be construed as limiting their scope.
[00204] The systems and methods disclosed above in this document may be deployed as software, firmware, hardware, or a combination thereof. In a hardware deployment, the division of tasks between functional units referred to in the description above does not necessarily correspond to the division into physical units; on the contrary, a physical component can have multiple functionalities and a task can be performed through several physical components in cooperation. Certain components or all components may be deployed as software running through a digital signal processor or microprocessor, or be deployed as hardware or as an application-specific integrated circuit. Such software may be distributed on computer readable media, which may comprise computer storage media (or non-transient media) and communication media (or transient media). As is well known to a person skilled in the art, the term computer storage media includes both volatile and non-volatile, removable and non-removable media implanted in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, fast memory or other memory technology, CD-ROM, digital versatile discs (DVD) or other optical disc storage, magnetic cassette, magnetic tape , magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and that can be accessed by a computer. Additionally, it is well known to a person skilled in the art that communication media typically embed computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and includes any means of delivering information.

权利要求:
Claims (15)
[0001]
1. Audio processing system (100) configured to accept a stream of audio bits, the processing system characterized in that it comprises: a decoder (101) adapted to receive a stream of bits and output quantized spectral coefficients; a front-end component including: a dequantization stage (102) adapted to receive the quantized spectral coefficients and to output a first frequency domain representation of an intermediate signal; and an inverse transform stage (103) for receiving the first frequency domain representation of the intermediate signal and synthesizing, based thereon, a time domain representation of the intermediate signal; a processing stage which includes: analysis (104) for receiving the time domain representation of the intermediate signal and outputting a second frequency domain representation of the intermediate signal; at least one processing component (105, 106, 107) for receiving said second frequency domain representation of the intermediate signal and outputting a frequency domain representation of a processed audio signal; and a synthesis filter bank (108) for receiving the frequency domain representation of the processed audio signal and outputting a time domain representation of the processed audio signal; and a sample rate converter (109) for receiving said time domain representation of the processed audio signal and outputting a reconstructed audio signal sampled at a target sampling frequency, wherein the respective internal sampling rates of the network are sampled. time domain representation of the intermediate audio signal and the time domain representation of the processed audio signal are the same, and wherein said at least one processing component includes: a parametric augmentation mixing stage (106) for receiving a reduction mixing signal with M channels and outputting, based on it, a signal with N channels, in which the parametric increase mixing stage is operable in at least one mode in which 1 < M < N, associated to a delay, and a mode where 1 < M = N; and a first delay stage configured to incur a delay, where the parametric boost mixing stage is in the mode where 1 < M = N; to compensate for the delay associated with the mode where 1 < M < N so that the processing stage has a constant total delay independent of a current operating mode of the parametrically increasing mixing stage.
[0002]
2. Audio processing system, according to claim 1, characterized in that the frontend component is operable in an audio mode and a specific voice mode, and in which a mode change from the audio mode to the The front-end component's specific voice mode includes reducing a maximum frame length of the internal transform stage.
[0003]
3. Audio processing system according to claim 2, characterized in that the sample rate converter is operable to provide a reconstructed audio signal sampled at the target sampling frequency that differs by up to 5% from the sample rate. internal sampling of said time domain representation of the processed audio signal.
[0004]
4. Audio processing system, according to claim 1, characterized in that it additionally comprises a bypass line arranged parallel to the processing stage and which comprises a second delay stage configured to incur a delay equal to the constant total delay of the processing stage.
[0005]
5. Audio processing system according to claim 1, characterized in that the parametric augmentation mixing stage is additionally operable in at least one mode where M = 3 and N = 5.
[0006]
6. Audio processing system, according to claim 5, characterized by the fact that the frontend component is configured, in this mode of the mixing stage of parametric increase in which M = 3 and N = 5, to provide an intermediate signal which comprises a downmixing signal in which the front-end component derives two channels out of the M=3 channels from co-encoded channels in the audio bitstream.
[0007]
7. Audio processing system, according to claim 1, characterized in that the at least one processing component additionally includes a spectral band replication module (106) arranged upstream of the boost mixing stage parametric and operable to reconstruct high frequency content, where the spectral band replication module: is configured to be active at least in those modes of the parametric boost mixing stage, where M < N; e is operable irrespective of the current mode of the parametric boost mix stage when the parametric boost mix stage is in any of the modes where M = N.
[0008]
8. Audio processing system, according to claim 7, characterized in that the at least one processing component additionally includes a waveform encoding stage (214 in Figure 8) arranged parallel to or downstream of the stage and operable to boost each of the N channels with encoded waveform low frequency content, where the waveform encoding stage is toggle on and off regardless of the current mode of the parametric boost mixing stage and the spectral band replication module.
[0009]
9. Audio processing system according to claim 8, characterized in that it is operable in at least one decoding mode characterized by the fact that the parametric augmentation mixing stage is in an M = N mode with M > 2.
[0010]
10. Audio processing system, according to claim 9, characterized in that it is operable in at least the following decoding modes: i) parametric augmentation mixing stage in M = N = 1 mode; ii) parametric boost mixing in M = N = 1 mode and active spectral band replication module; iii) parametric boost mixing stage in M = 1, N = 2 mode and active spectral band replication module; iv) mixing stage augmentation in M=1, N=2 mode, active spectral band replication module and active waveform encoding stage; v) parametric augmentation mixing stage in M=2, N=5 mode and replication module active spectral bandwidth; vi) parametric boost mixing stage in M = 2, N = 5 mode, active spectral band replication module and active waveform encoding stage; vii) parametric boost mixing stage in active M = 3, N = 5 and spectral band replication module a tivo;viii) parametric boost mixing stage in M = N = 2 mode; ix) parametric boost mixing stage in M = N = 2 mode and active spectral band replication module;x) parametric boost mixing stage in mode M = N = 7; xi) mixing stage of parametric increase in mode M = N = 7 and active spectral band replication module.
[0011]
11. Audio processing system, according to claim 1, characterized in that it also comprises the following components arranged downstream of the processing stage: a phase-shifting component configured to receive the time domain representation of the signal of processed audio, wherein at least one channel represents a surround channel and for performing a 90 degree phase shift in said smallest surround channel; and a downmix component configured to receive the processed audio signal from the phase shift component and output from it a two-channel downmix signal.
[0012]
12. Audio processing system, according to claim 1, characterized in that it further comprises an Lfe decoder configured to prepare at least one additional channel based on the audio bit stream and includes said ) additional channel(s) in the reconstructed audio signal.
[0013]
13. Method for processing an audio bit stream, the method characterized in that it comprises: providing quantized spectral coefficients based on the bit stream; receive the quantized spectral coefficients and perform inverse quantization followed by a frequency-to-time transformation whereby a time domain representation of an intermediate audio signal is obtained; provide a time domain representation of the intermediate audio signal with based on the time-domain representation of the intermediate audio signal; providing a time-domain representation of a processed audio signal by performing at least one time-domain processing step on the time-domain representation of the audio signal. intermediate audio; providing a time domain representation of a processed audio signal by performing at least one processing step on the frequency domain representation of the intermediate audio signal; and change the sampling rate of the time domain representation of the processed audio signal at a target sampling frequency by which a reconstructed audio signal is obtained, where the respective internal sampling rates of the domain representation time of the intermediate audio signal and the time domain representation of the processed audio signal are the same, wherein the method further comprises determining a current mode among at least one mode where 1 < M < N, associated with a delay, and a mode where 1 < M = N, wherein the at least one processing step includes: receiving an M channel downmix signal and outputting an N channel signal based on it; in response to the current mode which is the mode where 1 < M = N, incur a delay to compensate for the delay associated with the mode where 1 < M < N so that the total delay of the processing step is constant regardless of the current mode.
[0014]
14. Method according to claim 13, characterized in that the inverse quantization and/or time-frequency transformation are performed on a hardware component operable in at least one audio mode and one specific voice mode, in that a current mode is selected according to metadata associated with quantized spectral coefficients, and that changing the mode from audio mode to specific speech mode includes reducing a maximum frame length of the frequency-to-time transformation
[0015]
15. Computer program product characterized in that it comprises a non-transient computer-readable medium with instructions for carrying out the method, as defined in claim 13.

类似技术:

公开号 | 公开日 | 专利标题

BR112015025092B1|2022-01-11|AUDIO PROCESSING SYSTEM AND METHOD FOR PROCESSING AN AUDIO BITS FLOW

KR102083200B1|2020-04-28|Apparatus and method for encoding or decoding multi-channel signals using spectrum-domain resampling

US11238874B2|2022-02-01|Audio encoder for encoding a multichannel signal and audio decoder for decoding an encoded audio signal

ES2881076T3|2021-11-26|Apparatus and method for efficient encoding of object metadata

KR101422745B1|2014-07-24|Apparatus and method for coding and decoding multi object audio signal with multi channel

US8046214B2|2011-10-25|Low complexity decoder for complex transform coding of multi-channel sound

US8249883B2|2012-08-21|Channel extension coding for multi-channel source

ES2609449T3|2017-04-20|Audio decoding

JP6735053B2|2020-08-05|Stereo filling apparatus and method in multi-channel coding

KR101290486B1|2013-07-26|Apparatus, method and computer program for upmixing a downmix audio signal

KR20140004086A|2014-01-10|Improved stereo parametric encoding/decoding for channels in phase opposition

BRPI0606387B1|2019-11-26|DECODER, AUDIO PLAYBACK, ENCODER, RECORDER, METHOD FOR GENERATING A MULTI-CHANNEL AUDIO SIGNAL, STORAGE METHOD, PARACODIFYING A MULTI-CHANNEL AUDIO SIGN, AUDIO TRANSMITTER, RECEIVER MULTI-CHANNEL, AND METHOD OF TRANSMITTING A MULTI-CHANNEL AUDIO SIGNAL

BRPI0618002A2|2011-08-16|method for better temporal and spatial conformation of multichannel audio signals

BRPI0520053B1|2019-02-19|MULTI-CHANNEL OR TRANSPARENT MULTI-CHANNEL ENCODER / DECODER SCHEME

US20220059110A1|2022-02-24|Audio decoder for interleaving signals

TW201423728A|2014-06-16|Encoder, decoder and methods for backward compatible multi-resolution spatial-audio-object-coding

JP6640849B2|2020-02-05|Parametric encoding and decoding of multi-channel audio signals

TW201730876A|2017-09-01|Apparatus and method for processing an encoded audio signal

JP7009437B2|2022-01-25|Parametric encoding and decoding of multi-channel audio signals

BR112020015570A2|2021-02-02|audio scene encoder, audio scene decoder and methods related to the use of hybrid encoder / decoder spatial analysis

MX2008009186A|2008-09-26|Complex-transform channel coding with extended-band frequency coding

同族专利:

公开号 | 公开日

CN105247613B|2019-01-18|

EP2981956A2|2016-02-10|

US9812136B2|2017-11-07|

HK1214026A1|2016-07-15|

US20160372123A1|2016-12-22|

RU2625444C2|2017-07-13|

IN2015MN02784A|2015-10-23|

CN109509478A|2019-03-22|

KR20150139601A|2015-12-11|

KR101717006B1|2017-03-15|

CN105247613A|2016-01-13|

JP2016514858A|2016-05-23|

RU2015147158A|2017-05-17|

JP6013646B2|2016-10-25|

JP2017017749A|2017-01-19|

JP6407928B2|2018-10-17|

US9478224B2|2016-10-25|

WO2014161996A2|2014-10-09|

WO2014161996A3|2014-12-04|

BR112015025092A2|2017-07-18|

US20160055855A1|2016-02-25|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JP3582589B2|2001-03-07|2004-10-27|日本電気株式会社|Speech coding apparatus and speech decoding apparatus|

US7644003B2|2001-05-04|2010-01-05|Agere Systems Inc.|Cue-based audio coding/decoding|

JP4108317B2|2001-11-13|2008-06-25|日本電気株式会社|Code conversion method and apparatus, program, and storage medium|

US7292901B2|2002-06-24|2007-11-06|Agere Systems Inc.|Hybrid multi-channel/cue coding/decoding of audio signals|

US7657427B2|2002-10-11|2010-02-02|Nokia Corporation|Methods and devices for source controlled variable bit-rate wideband speech coding|

DE602004005020T2|2003-04-17|2007-10-31|Koninklijke Philips Electronics N.V.|AUDIO SIGNAL SYNTHESIS|

US7412380B1|2003-12-17|2008-08-12|Creative Technology Ltd.|Ambience extraction and modification for enhancement and upmix of audio signals|

US7394903B2|2004-01-20|2008-07-01|Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V.|Apparatus and method for constructing a multi-channel output signal or for generating a downmix signal|

GB0402661D0|2004-02-06|2004-03-10|Medical Res Council|TPL2 and its expression|

CA2457988A1|2004-02-18|2005-08-18|Voiceage Corporation|Methods and devices for audio compression based on acelp/tcx coding and multi-rate lattice vector quantization|

SE0400998D0|2004-04-16|2004-04-16|Cooding Technologies Sweden Ab|Method for representing multi-channel audio signals|

DE102004043521A1|2004-09-08|2006-03-23|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Device and method for generating a multi-channel signal or a parameter data set|

EP1817767B1|2004-11-30|2015-11-11|Agere Systems Inc.|Parametric coding of spatial audio with object-based side information|

US7903824B2|2005-01-10|2011-03-08|Agere Systems Inc.|Compact side information for parametric coding of spatial audio|

EP1866912B1|2005-03-30|2010-07-07|Koninklijke Philips Electronics N.V.|Multi-channel audio coding|

US7961890B2|2005-04-15|2011-06-14|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung, E.V.|Multi-channel hierarchical audio coding with compact side information|

CN101889307B|2007-10-04|2013-01-23|创新科技有限公司|Phase-amplitude 3-D stereo encoder and decoder|

US20080004883A1|2006-06-30|2008-01-03|Nokia Corporation|Scalable audio coding|

EP2054875B1|2006-10-16|2011-03-23|Dolby Sweden AB|Enhanced coding and parameter representation of multichannel downmixed object coding|

US8363842B2|2006-11-30|2013-01-29|Sony Corporation|Playback method and apparatus, program, and recording medium|

JP4930320B2|2006-11-30|2012-05-16|ソニー株式会社|Reproduction method and apparatus, program, and recording medium|

US8200351B2|2007-01-05|2012-06-12|STMicroelectronics Asia PTE., Ltd.|Low power downmix energy equalization in parametric stereo encoders|

US8290167B2|2007-03-21|2012-10-16|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Method and apparatus for conversion between multi-channel audio formats|

AT518224T|2008-01-04|2011-08-15|Dolby Int Ab|AUDIO CODERS AND DECODERS|

US8546172B2|2008-01-18|2013-10-01|Miasole|Laser polishing of a back contact of a solar cell|

AU2009267530A1|2008-07-11|2010-01-14|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|An apparatus and a method for generating bandwidth extension output data|

EP2144230A1|2008-07-11|2010-01-13|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Low bitrate audio encoding/decoding scheme having cascaded switches|

KR101261677B1|2008-07-14|2013-05-06|광운대학교 산학협력단|Apparatus for encoding and decoding of integrated voice and music|

KR101381513B1|2008-07-14|2014-04-07|광운대학교 산학협력단|Apparatus for encoding and decoding of integrated voice and music|

EP2146344B1|2008-07-17|2016-07-06|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoding/decoding scheme having a switchable bypass|

US9330671B2|2008-10-10|2016-05-03|Telefonaktiebolaget L M Ericsson |Energy conservative multi-channel audio coding|

WO2010075895A1|2008-12-30|2010-07-08|Nokia Corporation|Parametric audio coding|

EP2214161A1|2009-01-28|2010-08-04|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus, method and computer program for upmixing a downmix audio signal|

KR101367604B1|2009-03-17|2014-02-26|돌비 인터네셔널 에이비|Advanced stereo coding based on a combination of adaptively selectable left/right or mid/side stereo coding and of parametric stereo coding|

FR2947945A1|2009-07-07|2011-01-14|France Telecom|BIT ALLOCATION IN ENCODING / DECODING ENHANCEMENT OF HIERARCHICAL CODING / DECODING OF AUDIONUMERIC SIGNALS|

KR20110022252A|2009-08-27|2011-03-07|삼성전자주식회사|Method and apparatus for encoding/decoding stereo audio|

US9117458B2|2009-11-12|2015-08-25|Lg Electronics Inc.|Apparatus for processing an audio signal and method thereof|

US8442837B2|2009-12-31|2013-05-14|Motorola Mobility Llc|Embedded speech and audio coding using a switchable model core|

US8423355B2|2010-03-05|2013-04-16|Motorola Mobility Llc|Encoder for audio signal including generic audio and speech frames|

EP2375409A1|2010-04-09|2011-10-12|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoder, audio decoder and related methods for processing multi-channel audio signals using complex prediction|

US8489391B2|2010-08-05|2013-07-16|Stmicroelectronics Asia Pacific Pte., Ltd.|Scalable hybrid auto coder for transient detection in advanced audio coding with spectral band replication|

WO2012040898A1|2010-09-28|2012-04-05|Huawei Technologies Co., Ltd.|Device and method for postprocessing decoded multi-channel audio signal or decoded stereo signal|

WO2012058805A1|2010-11-03|2012-05-10|Huawei Technologies Co., Ltd.|Parametric encoder for encoding a multi-channel audio signal|

PL2550653T3|2011-02-14|2014-09-30|Fraunhofer Ges Forschung|Information signal representation using lapped transform|

EP2523473A1|2011-05-11|2012-11-14|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for generating an output signal employing a decomposer|

EP2777042B1|2011-11-11|2019-08-14|Dolby International AB|Upsampling using oversampled sbr|

KR101717006B1|2013-04-05|2017-03-15|돌비 인터네셔널 에이비|Audio processing system|TWI557727B|2013-04-05|2016-11-11|杜比國際公司|An audio processing system, a multimedia processing system, a method of processing an audio bitstream and a computer program product|

KR101717006B1|2013-04-05|2017-03-15|돌비 인터네셔널 에이비|Audio processing system|

CN107077856B|2014-08-28|2020-07-14|诺基亚技术有限公司|Audio parameter quantization|

WO2016142002A1|2015-03-09|2016-09-15|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Audio encoder, audio decoder, method for encoding an audio signal and method for decoding an encoded audio signal|

CA2982017A1|2015-04-10|2016-10-13|Thomson Licensing|Method and device for encoding multiple audio signals, and method and device for decoding a mixture of multiple audio signals with improved separation|

EP3107096A1|2015-06-16|2016-12-21|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Downscaled decoding|

WO2017080835A1|2015-11-10|2017-05-18|Dolby International Ab|Signal-dependent companding system and method to reduce quantization noise|

CN108496221B|2016-01-26|2020-01-21|杜比实验室特许公司|Adaptive quantization|

KR20170109456A|2016-03-21|2017-09-29|한국전자통신연구원|Apparatus and method for encoding / decoding audio based on block|

US20170289536A1|2016-03-31|2017-10-05|Le HoldingsCo., Ltd.|Method of audio debugging for television and electronic device|

JP6976277B2|2016-06-22|2021-12-08|ドルビー・インターナショナル・アーベー|Audio decoders and methods for converting digital audio signals from the first frequency domain to the second frequency domain|

US10249307B2|2016-06-27|2019-04-02|Qualcomm Incorporated|Audio decoding using intermediate sampling rate|

US10224042B2|2016-10-31|2019-03-05|Qualcomm Incorporated|Encoding of multiple audio signals|

EP3748633A1|2016-11-08|2020-12-09|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Downmixer and method for downmixing at least two channels and multichannel encoder and multichannel decoder|

US10475457B2|2017-07-03|2019-11-12|Qualcomm Incorporated|Time-domain inter-channel prediction|

US10950251B2|2018-03-05|2021-03-16|Dts, Inc.|Coding of harmonic signals in transform-based audio codecs|

CN112567768A|2018-06-18|2021-03-26|奇跃公司|Spatial audio for interactive audio environments|

WO2020180424A1|2019-03-04|2020-09-10|Iocurrents, Inc.|Data compression and communication using machine learning|

CN110335615B|2019-05-05|2021-11-16|北京字节跳动网络技术有限公司|Audio data processing method and device, electronic equipment and storage medium|

WO2021004049A1|2019-07-09|2021-01-14|海信视像科技股份有限公司|Display device, and audio data transmission method and device|

RU2731602C1|2019-09-30|2020-09-04|Ордена трудового Красного Знамени федеральное государственное бюджетное образовательное учреждение высшего образования "Московский технический университет связи и информатики" |Method and apparatus for companding with pre-distortion of audio broadcast signals|

法律状态:
2018-11-13| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2020-06-02| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-11-03| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2022-01-11| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 04/04/2014, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US201361809019P| true| 2013-04-05|2013-04-05|

US61/809,019|2013-04-05|

US201361875959P| true| 2013-09-10|2013-09-10|

US61/875,959|2013-09-10|

PCT/EP2014/056857|WO2014161996A2|2013-04-05|2014-04-04|Audio processing system|

[返回顶部]